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(57) Abstract 

Rapid and highly efficient proce- 
dures for the total synthesis of linear, 
double stranded DNA sequences in excess 
of about 2tf0 base pairs in length, which 
sequences may comprise entire structural 
genes. Novel sequences are prepared from 
two or more DNA subunits provided with 
terminal regions comprising restriction en- 
donuclease cleavage sites facilitating in- 
sertion of subunits into a selected vector 
for purposes of amplification during the 
course of the total assembly process. The 
total, finally-assembled sequences include , 
at least one, and preferably two or more, 
unique restriction endonuclease cleavage 
site(s) at intermediate positions along the 
sequence, allowing for easy excision and 
replacement of subunits and the corre- 
spe«dmg^' facile preparation of multiple 
structural analogs of polypeptides coded 
for by the sequences. Manufactured genes 

preferably include codons selected from , . j u 

among alternative codons specifying the same amino acid on the basis of preferential expression m a projected host micro- 
organism (e.g., E. coll) to be transformed. Illustrated is the preparation and expression of manufactured genes capable of 
directing synthesis of human immune and leukocyte interferons and of other biologically active proteinaceous products, 
which products differ from naturally-occurring forms in terms of the identity and/or relative position of one or more anu- 
no acids, and in terms of one or more biological and pharmacological properties but which substantially retain other such 
properties. 
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THE MANUFACTURE AND EXPRESSION 
OF LARGE STRUCTURAL GENES 

This is a continuation-in-part of co-pending 
U.S. Patent Application Serial No. 375,494, filed 

5 May 6, 1982. 

The present invention relates generally 
to the manipulation of. genetic materials and, more 
particularly, to the manufacture of specific DNA se- 
quences useful in recombinant procedures to secure 

10 the production of proteins of interest. 

Genetic materials may be broadly defined 
as those chemical substances which program for and 
guide the manufacture of constituents of cells and 
viruses and direct the responses of cells and viruses. 

15 A long chain polymeric substance known as deoxyribo- 
nucleic acid (DNA) comprises the genetic material 
of all living cells and viruses except for certain 
viruses which are programmed by ribonucleic acids 
(RNA) . The repeating units in DNA polymers are four 

20 different nucleotides, each of which consists of either 
a purine (adenine or guanine) or a pyrimidine (thymine 
or cytosine) bound to a deoxyribose sugar to which 
a phosphate group is attached. Attachment of nucleotides 
in linear polymeric form is by means of fusion of 

25 the 5' phosphate of one nucleotide to the 3' hydroxyl 
group of another. Functional DNA occurs in the form 
of stable double stranded associations of single strands 
of nucleotides (known as deoxyoligonucleotides ) , which 
associations occur by means of hydrogen bonding between 
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purine and pyrimidine bases [i.e., "complementary" 
associations existing either between adenine (A) and 
thymine (T) or guanine (G) and cytosine (C) ] • By 
convention, nucleotides are referred to by the names 
5 of their constituent purine or pyrimidine bases, and 
the complementary associations of nucleotides in double 
stranded DNA (i.e«, A-*T and G-C) are referred to as 
"base pairs". Ribonucleic acid is a polynucleotide 
comprising adenine, guanine, cytosine and uracil (U) , 
10 rather than thymine, bound to ribose and a phosphate 
group. . 

Most briefly put, the programming function 
of DNA is generally effected through a process wherein 
specific DNA nucleotide sequences (genes) are "trans- 

15 cribed" into relatively unstable messenger RNA (mRNA) 
polymers* The mRNA, in turn, serves as a template 
for the formation of structural, regulatory and cata- 
lytic proteins from amino acids. This translation 
process involves the operations of small RNA strands 

20 (tRNA) which transport and align individual amino 
acids along the mRNA strand to allow for formation 
of polypeptides in proper amino acid sequences. The 
mRNA "message", derived from DNA and providing the 
.basis for the tRNA supply and orientation of any given 

25 one of the twenty amino acids for polypeptide "expres- 
sion", is in the form of triplet "codons" — sequential 
groupings of three nucleotide bases* In one sense, 
the formation of a protein is the ultimate form of 
"expression" of the programmed genetic message provided 

30 by the nucleotide sequence of a gene. 

Certain DNA sequences which usually "precede" 
a gene in a DNA polymer provide a site for initiation 
of the transcription into mRNA.' These are referred 
to as "promoter" sequences. Other DNA sequences, 

35 also usually "upstream" of (i.e., preceding) a gene 
in a given DNA polymer, bind proteins that determine 
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the frequency (or rate) of transcription initiation. 
These other segeunces are referred to as "regulator" 
sequences. Thus, sequences which precede a selected 
gene (or series of genes) in a functional DNA polymer 
5 and which operate to determine whether the transcription 
(and eventual expression) of a gene will take place 
are collectively referred to as "promoter/regulator" 
or "control" DNA sequences. DNA sequences which "follow" 
a gene in a DNA polymer and provide a signal for termina- 

10 tion of the transcription into mRNA are referred to 
as "terminator" sequences. 

A focus of microbiological processing for 
nearly the last decade has been the attempt to manufac- 
ture industrially and pharmaceutically significant 

15 substances using organisms which do not intially have 
genetically coded information concerning the desired 
product included in their DNA. Simply put, a gene 
that specifies the structure of a product is either 
isolated from "a "donor" organism or chemically synthe- 

20 sized and then stably introduced into another organism 
which is preferably a self-replicating unicellular 
microorganism. Once this is done, the existing machinery 
for gene expression in the "transformed" host cells 
operates to construct the desired product. 

25 The art is rich in patent and literature 

publications relating to "recombinant DNA" methodologies 
for the isolation/ synthesis, purification and amplifica- 
tion of genetic materials for use in the transformation 
of selected host organisms. U.S. Letters Patent No. 

30 4,237,224 to Cohen, et al., for example, relates to 

transformation of procaryotic unicellular host organisms 
with "hybrid" viral or circular plasmid DNA which 
includes selected exogenous DNA sequences. The proce- 
dures of the Cohen, et al. patent first involve manufac- 

35 ture of a transformation vector by enzymatically cleav- 
ing viral or circular plasmid DNA to form linear DNA 
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strands. Selected foreign DNA strands are also prepared 
in linear form through use of similar enzymes. The 
linear viral or plasmid DNA is incubated with the 
foreign DNA in the presence of ligating enzymes capable 
5 of effecting a restoration process and "hybrid" vectors 
are formed which include the selected foreign DNA 
segment "spliced" into the viral or circular DNA plasmid. 

Transformation of compatible unicellular 
host organisms with the hybrid vector results in the 

10 formation of multiple copies of the foreign DNA in 
the host cell population. In some instances, the 
desired result is simply the amplification of the 
foreign DNA and the "product" harvested is DNA. More 
frequently, the goal of transformation is the expression 

15 by the host cells of the foreign DNA in the form of 
large scale synthesis of isolatable quantities of 
commercially significant protein or polypeptide fragments 
coded for by the foreign DNA. See also, e.g., a.S. 
Letters Patent Nos. 4,269,731 (to Shine), 4,273,875 

20 (to Maixis) and 4,293,652 (to Cohen). 

The success of procedures such as described 
in the Cohen , et al. patent is due in large part to 
the ready availability of "restriction endonuclease" 
. enyzmes which facilitate the site-specific cleavage 

25 of both the unhybridized DNA vector and, e.g., eukaryotic 
DNA strands containing the foreign sequences of interest. 
Cleavage in a manner providing for the formation of 
single stranded complementary "ends" on the double 
stranded linear DNA strands greatly enhances the likeli- 

30 hood of f unctionalr incorporation of the foreign DNA 
into the vector upon "ligating" enzyme treatment. 
A large number of such restriction endonuclease enzymes 
are currently commercially available [See, e.g., "BRL 
Restriction Endonuclease Reference Chart" appearing 

35 in the "'81/' 82 Catalog" of Bethesda Research Labora- 
tories, Inc., Gaithersburg, Maryland.] Verification 
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of hybrid formation is facilitated by chromatographic 
techniques which can, for example, distinguish the 
hybrid plasmids from non-hybrids on the basis of molecular 
weight. Other useful verification techniques involve 
5 radioactive DNA hybridization. 

Another manipulative "tool" largely responsible 
for successes in transformation of procaryotic cells 
is the use of selectable "marker" gene sequences. 
Briefly put, hybrid vectors are employed which contain, 

10 in addition to the desired foreign DNA, one or more 

DNA sequences which code for expression of a phenotypic 
trait capable of distinguishing -transformed from non- 
transformed host cells. Typical marker gene sequences 
are those which allow a transformed procaryotic cell 

15 to survive and propagate in a culture medium containing 
metals, antibiotics, and like components which would 
kill or severely inhibit propagation of non-transformed 

host cells. 

Successful expression of an exogenous gene 

20 in a transformed host microorganism depends to a great 
extent on incorporation of the gene into a transformation 
vector with a suitable promoter/regulator region present 
to insure transcription of the gene into mRNA and 
other signals which insure translation of the mRNA 

25 message into protein (e.g., ribosome binding sites). 

It is not often the case that the "original" promoter/- 
regulator region of a gene will allow for high levels 
of expression in the new host. Consequently, the 
gene to be inserted must either be fitted with a new, 

30 host-accommodated transcription and translation regu- 
lating DNA sequence prior to insertion or it must 
be inserted at a site where it will come under the 
control of existing transcription and translation 
signals in the vector DNA. 

35 It is frequently the case that the insertion 

of an exogenous gene into, e.g., a circular DNA plasmid 
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vector f is performed at a site either immediately 
following an extant transcription and translation 
signal or within an existing plasmid-borne gene coding 
for a rather large protein which is the subject of 
5 high degrees of expression in the host. In the latter 
case, the host's expression of the "fusion gene" so 
formed results in high levels of production of a "fusion 
protein" including the desired protein sequence (e.g., 
as an intermediate segment which can be isolated by 

10 chemical cleavage of large protein) . Such procedures 
not only insure desired regulation and high levels 
of expression of the exogenous gene product but also 
result in a degree of protection of the desired protein 
product from attack by proteases endogenous to the 

15 host* Further, depending on the host organism, such 

procedures may allow for a kind of "piggyback" transporta- 
tion of the desired protein from the host cells into 
the cell culture medium, eliminating the need to destroy 
host cells for the purpose of isolating the desired 

20 product. 

While the foregoing generalized descriptions 
of published recombinant DNA methodologies may make 
the processes appear to be rather straightforward, 
.easily performed and readily verified, it is actually 
25 the case that the DNA sequence manipulations involved 
are quite painstakingly difficult to perform and almost 
invariably characterized by very low yields of desired 
products • 

As an example, the initial "preparation" 
30 of a gene for insertion into a vector to be used^ in 
transformation of a host. microorganism can be an enor- 
mously difficult process, especially where the gene 
to be expressed is endogenous to a higher organism 
such as man. One laborious procedure practiced in 
35 the art is the systematic cloning into recombinant 

plasmids of the total DNA genome of the "donor" cells. 
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generating immense "libraries" of transformed cells 
carrying random DNA sequence fragments which must 
be individually tested for expression of a product 
of interest. According to another procedure, total 
5 niRNA is isolated from high expression donor cells 
(presumptively containing multiple copies of mRNA 
coded for the product of interest) , first "copied" 
into single stranded cDNA with reverse transcriptase 
enzymes, then into double stranded form with polymerase, 

10 and cloned.- The procedure again generates a library 
of transformed cells somewhat smaller than a total 
genome library which may include the desired gene 
copies free of non-transcribed "introns" which can 
significantly interfere with expression by a host 

15 microorganism. The above-noted time-consuming gene 

isolation procedures were in fact employed in published 
recombinant DNA procedures for obtaining microorganism 
expression of several proteins, including rat proinsulin 
[Ullrich, et al.. Science , 196, pp. 1313-1318 (1977)], 

20 human fibroblast interferon [Goedell, et al.. Nucleic 

Acids Research , 8, pp. 4087-4094 (1980)], mouse B-endor- 
phin [Shine, et al.. Nature , 285, pp. 456-461 (1980) ] 
and human leukocyte interferon [Goedell, et al.. Nature , 
287, pp. 411-416 (1980); and Goedell, et al.. Nature , 

25 290, pp. 20-26 (1981)]. 

Whenever possible, the partial or total 
manufacture of genes of interest from nucleotide bases 
constitutes a much preferred procedure for preparation 
of genes to be used in recombinant DNA methods. A 

30 requirement for such manufacture is, of course, knowledge 
of the correct amino acid sequence of the desired 
polypeptide. With this information in hand, a generative 
DNA sequence code for the protein (i.e., a properly 
ordered series of base triplet codons) can be planned 

35 and a corresponding synthetic, double stranded DNA 

segment can be constructed. A combination of manufac- 
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taring and cONA synthetic methodologies is reported 
to have been employed in the generation of a gene 
for human growth hormone. Specifically/ a manufactured 
linear double stranded DNA sequence of 72 nucleotide 

5 base pairs (comprising codons specifying the first 

24 amino acids of the desired 191 amino acid polypeptide) 
was ligated to a cDNA-derived double strand coding 
for amino acids Nos, 25-191 and inserted in a modified 
pBR322 plasmid at a locus controlled by a lac promoter/- 

10 regulator sequence [Goedell, et al-. Nature , 281 / 
pp. 544-548 (1981)1. 

Completely synthetic procedures have been 
employed for the manufacture of genes coding for rela- 
tively "short" biologically functional polypeptides, 

15 such as human somatostatin (14 amino acids) and human 
insulin (2 polypeptide chains of 21 and 30 amino acids, 
respectively) • 

In the somatostatin gene preparative procedure 
[Itakura, et al.. Science , 198 , pp. 1056-1063 (1977)] 

20 a 52 base pair gene was constructed wherein 42 base 
pairs represented the codons specifying the required 
14 amino acids and an additional 10 base pairs were 
added to permit formation of "sticky-end" single stran- 
. ded terminal regions employed for ligating the structural 

25 gene into a microorganism transformation vector. 

Specifically, the gene was inserted close to the end 
of a e-galactosidase enzyme gene and the resultant 
fusion gene was expressed as a fusion protein from 
which somatostatin was isolated by cyanogen bromide 

30 cleavage. Manufacture of the human insulin gene, 

as noted above, involved preparation of genes coding 
for a 21 amino acid chain and for a 30 amino acid 
chain. Eighteen deoxyoligonucleotide fragments were 
combined to make the gene for the longer chain, and 

35 eleven fragments were joined into a gene for the shorter 
chain. Each gene was employed to form a fusion gene 
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with a 3-galactosidase gene and the individually ex- 
pressed polypeptide chains were enzymatically isolated 
and linked to form complete insulin molecules. [Goedell, 
et al,, Proc. Nat. Acad. Sci. U.S.A ., 76, pp. 106-110 
5 (1979).] 

In each of the above procedures, deoxyoligo- 
nucleotide segments were prepared, and then sequentially 
ligated according to the following general procedure. 
[See, e.g., Agarwal, et al.. Nature , 227, pp. 1-7 

10 (1970) and Khorana, Science , 203 , pp. 614-675 (1979)]. 

An initial "top" (i.e., 5 '-3' polarity) deoxyoligonucleo- 
tide segment is enzymatically joined to a second "top" 
segment. Alignment of these two "top" strands is 
made possible using a "bottom" (i.e., 3' to 5' polarity) 

15 strand having a base sequence complementary to half 
of the first top strand and half of the second top 
strand. After joining, the uncomplemented bases of 
the top strands "protrude" from the duplex portion 
formed. A second bottom strand is added which includes 

20 the five or six base complement of a protruding top 
strand, plus an additional five or six bases which 
then protrude as a bottom single stranded portion. 
The two bottom strands are then joined. Such sequen- 
. tial additions are continued until a complete gene 

25 sequence is developed, with the total procedure being 
very time-consuming and highly inefficient. 

The time-consuming characteristics of such 
methods for total gene synthesis are exemplified by 
reports that three months' work by at least four inves- 

30 tigators was needed to perform the assembly of the 
two "short", insulin genes previously referred to. 
Further, while only relatively small quantities of 
any manufactured gene are needed for success of vector 
insertion, the above synthetic procedures have such 

35 poor overall yields (on the order of 20% per liga- 
tion) that the eventual isolation of even minute quanti- 
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ties of a selected short gene is by no means guaranteed 
with even the most scrupulous adherence to prescribed 
methods. The maximum length gene which can be synthe- 
sized is clearly limited by the efficiency with which 
5 the individual short segments can be- joined. If n 
such ligation reactions are required and the yield 
of each such reaction is the quantity of correctly 
synthesized genetic material obtained will be propor- 
tional to y^. Since this relationship is expotential 

10 in nature / even a small increase in the yield per 

ligation- reaction will result in a substantial increase 
in the length of the largest gene that may be synthesized, 

Inefficiencies in the above-noted methodology 
are due in large part to the formation of undesired 

15 intermediate products • As an example r in an initial 
reaction forming annealed top strands associated with 
a bottom, "template" strand , the desired reaction . 
may be. 



20 



25 



+ 
b 



+ 
c 



but the actual products obtained may be 
a /V a / or 



30 ' a /V b 

a 

or the like* Further, the longer the individual deoxy- 
olidonucleotides are, the more likely it is that they 
will form thermodynamically stable self -associations 
such as "hairpins" or aggregations. 

f Ohm ^ 

V/IPO 

i 
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Proposals for increasing synthetic efficiency 
have not been forthcoming and it was recently reported 
that, "With the methods now available, however, it 
is not economically practical to synthesize genes 
5 for peptides longer than about 30 amino acid units,. 

and many clinically important proteins are much longer". 
[Aharonowitz, et al. Scientific American , 245, No. 
3, pp. 140-152, at p. 151 (1981).] 

An illustration of the "economic practicali- 

10 ties" involved in large gene synthesis is provided 
by the recent publication of "successful" efforts" 
in. the total synthesis of a human leukocyte interferon 
gene. [Edge, et al.. Nature , 292, pp. 756-782 (1981). I 
Briefly summarized, 67 different deoxyoligonucleotides 

15 containing about 15 bases were synthesized and joined 
in the "50 percent overlap" procedure of the type 
noted above to form eleven short duplexes. These, 
in turn were assembled into four longer duplexes which 
were eventually joined to provide a 514 base pair 

20 gene coding for the 166 amino acid protein. The proce- 
dure, which the authors characterize as "rapid", is 
reliably estimated to have consumed nearly a year's 
effort by five workers and the efficiency of the assembly 
.strategy was clearly quite poor. It may be noted, 

25 for example, that while 40 pmole Of each of the starting 
67 deoxyoligonucleotides was prepared and employed 
to form the eleven intermediate-sized duplexes, by 
• the time assembly of the four large duplexes was achieved 
a yield of only about 0.01 pmole of the longer duplexes 

30 could be obtained for. use in final assembly of the 
whole gene. 

Another aspect of the practice of recombinant 
DNA techniques for the expression, by microorganisms, 
of proteins of industrial and pharmaceutical interest 
35 is the phenomenon of "codon preference". While it 
was earlier noted that the existing machinery for 
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gene expression in genetically transformed host cells 
will "^operate" to construct a given desired product, 
levels of expression attained in a microorganism can 
be subject to wide variation, depending in part on 
5 specific alternative forms of the amino acid-specifying 
genetic code present in an inserted exogenous gene. 
A "triplet" codon of four possible nucleotide bases 
can exist in 64 variant forms. That these forms provide 
the message for only 20 different amino acids {as 

10 well as transcription initiation and termination) 

means that some amino acids can be coded for by more 
than one codon • Indeed, some amino acids have as 
many as six "redundant", alternative codons while 
some others have a single, required codon. For reasons 

15 not completely understood, alternative codons are 
not at all uniformly present in the endogenous DNA 
of differing types of cells and there appears to exist 
a variable natural hierarchy or "preference" for certain 
codons in certain types of cells. 

20 As one example, the amino acid leucine is 

specified by any of six DNA codons including CTA, 
CTC, CTG, CTT, TTA, and TTG (which correspond, respec- 
tively, to the mRNA codons, CUA, CUC, CUG, CUU, DDA 
' and UUG) . Exhaustive analysis of genome codon f requen- 

25 cies for microorganisms has revealed endogenous DNA 
of E. coli bacteria most commonly contains the CTG 
leucine-specifying codon, while the DNA of yeasts 
and slime molds most commonly includes a TTA leucine- 
specifying codon. In view of this hierarchy, it is 

30 generally held that the likelihood of obtaining high 
levels of expression of a leucine-rich polypeptide 
by an E. coli host will depend to some extent on the 
frequency of codon use. For example, a gene rich 
in TTA codons will in all probability be poorly expressed 

35 in E. coli , whereas a CTG rich gene will probably 
highly express the polypeptide. In a like manner. 
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" when yeast cells are the projected transformation 
host cells for expression of a leucine-rich polypeptide, 
a preferred codon for use in an inserted DNA would 
be TTA. See, e.g., Grantham, et al. Nucleic Acids 
5 Research , 8, pp. r49-r62 (1980); Grantham, et al.. 
Nucleic Acids Research , 8, pp. 1893-1912 (1980); 
and, Grantham, et al.. Nucleic Acids Research, 9, 
pp. r43-r74 (1981). 

The implications of codon preference phenomena 

10 on recombinant DNA techniques are manifest, and the 
phenomenon may serve to explain many prior failures 
• to achieve high expression levels for exogenous genes 
in successfully transformed host organisms — a less 
"preferred" codon may be repeatedly present in the 

15 inserted gene and the host cell machinery for expression 
may not operate as efficiently. This phenomenon directs 
the conclusion that wholly manufactured genes which 
have been designed to include a projected host cell's 
preferred codons provide a preferred form of foreign 

20 genetic material for practice of recombinant DNA tech- 
niques, in this context, the absence of procedures 
for rapid and efficient total gene manufacture which 
would permit codon selection is seen to constitute 
. an even more serious roadblock to advances in the 

25 art. 

Of substantial interest to the background 
of the present invention is the state of the art with 
regard to the preparation and use of a class of biologi- 
cally active substances, the interferons (IFNs) . 
30 Interferons are secreted proteins having fairly well- 
defined antiviral, antitumor and immunomodulatory 
characteristics. See, e.g.. Gray, et al.. Nature , 
295 , pp. 503-508 (1982) and Edge, et al., supra, and 

references cites therein. 
35 On the basis of antigenicity and biological 

and chemical properties, human interferons have been 
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grouped into three major classes: IFN-a (leukocyte), 
IFN-B (fibroblast) and IFN-y (immune) • Considerable 
information has accumulated on the structures and 
properties of the virus-induced acid-stable interferons 
5 (IFN-a and 3). These have been purified to homogeneity 
and at least partial amino acid sequences have been 
determined. Analyses of cloned cDNA and gene sequences 
for IFN-Bj^ and the IFN-a multigene family have permitted 
the deduction of the complete amino acid sequences 

10 of many of the interferons. In addition^ efficient 
synthesis of IFN-B^^ and several IFN-as in coli , 
and IFN-Qj^r in yeast, have now m'ade possible the purifica- 
tion of large quantities of these proteins in biologi- 
cally active form. 

15 Much less information is available concerning 

the structure and properties of IFN-yr an interferon 
generally produced in cultures of lymphocytes exposed 
to various mitogenic stimuli ♦ It is acid labile 
and does not cross-react with antisera prepared against 

20 IFN-a or IFN-g. A broad range of biological activities 
have been attributed to IFN-y including potentiation 
of the antiviral activities of IFN-a and -Sr from 
. which it differs in terms of its virus and cell specif ici- 
. ties and the antiviral mechanisms induced. In vitro 

25 studies performed with crude preparations suggest 

that the primary function of IFN-y niay be as an immuno- 
regulatory agent. The antiproliferative effect of 
IFN-Y on transformed cells has been reported to be 
10 to 100-fold greater than that of IFN-a or -B, suggest- 

30 ing a potential use in the treatment of neoplasia. 
Murine IFN-y preparations have been shown to have 
significant antitumor activity against mouse sarcomas. 

It has recently been reported (Gray/ et 
al.^, supra ) that a recombinant plasmid containing 

35 a cDNA sequence coding for human IFN-y has been isolated 
and characterized. Expression of this sequence in 
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E. coli and cultured monkey cells is reported to give 
rise to a polypeptide having the properties of authentic 
human IFN-y. In the publication, the cDNA sequence 
and the deduced 146 amino acid sequence of the "mature" 
polypeptide, exclusive of the putative leader sequence, 
is as follows: 

1 10 

Cys-Tyr-Cys-Gln-Asp-Pro-Tyr-Val-Lys-Glu-Ala-Glu-Asn-Leu- 
TGT TAG T6C GAG GAG CAA TAT GTA AAA GAA GCA GAA AAC CTT 

20 

Lys-Lys-Tyr-Phe-Asn-Ala-Gly-His-Ser-Asp-Val-Ala-Asp-Asn- 
AAG AAA TAT TTT AAT GCA GGT CAT TCA GAT GTA GCG GAT AAT 

30 40 
Gly-Thr-Leu-Phe-Leu-Gly-Ile-Leu-Lys-Asn-Trp-Lys-Glu-Glu- 

15 GGA ACT CTT TTC TTA GGC ATT TTG AAG AAT TGG AAA GAG GAG 

50 

Ser-Asp-Arg-Lys-Ile-Met-Gln-Ser-Gln-Ile-Val-Ser-Phe-Tyr- 
AGT GAC AGA AAA ATA ATG CAG AGC CAA ATT GTC TCC TTT TAG 

60 70 
20 phe-Lys-Leu-Phe-Lys-Asn-Phe-Lys-Asp-Asp-Gln-Ser-Ile-Gln- 

TTC AAA CTT TTT AAA AAC TTT AAA GAT GAC CAG AGC ATC CAA 

80 

Lys-Ser-Val-Glu-Thr-Ile-Lys-Glu-Asp-Met-Asn-Val-Lys-Phe- 
. AAG AGT GTG GAG ACC ATC AAG GAA GAC ATG AAT GTC AAG TTT 

25 90 

Phe-Asn-Ser-Asn-Lys-Lys-Lys-Arg-Asp-Asp-Phe-Glu-Lys-Leu- 

TTC AAT AGC AAC AAA AAG AAA CGA GAT GAC TTC GAA AAG CTG 

100 110 
Thr-Asn-Tyr-Ser-Val-Thr-Asp-Leu-Asn-Val-Gln-Arg-Lys-Ala- 

30 ACT AAT TAT TCG GTA ACT GAC TTG AAT GTC CAA CGC AAA GCA 

120 

Ile-His-Glu-Leu-Ile-Gln-Val-Met-Ala-Glu-Leu-Ser-Pro-Ala- 
ATA CAT GAA CTC CTC ATC CAA ATG GCT GAA CTG TCG CAA GCA 

130 140 
35 Ala-Lys-Thr-Gly-Lys-Arg-Lys-Arg-Ser-Gln-Met-Leu-Phe-Gln- 

GCT AAA ACA GGG AAG CGA AAA AGG AGT CAG ATG CTG TTT CAA 
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Gly-Arg-Arg-Ala-Ser-<jln 
GGT CGA AGA GCA TCC CAG. 

In a previous publication of the sequence , 
5 arginine, rather than glutamine, was specified at 

position 140 in the sequence, {Unless otherwise indi- 
cated/ therefore, reference to "human immune interferon" 

140 

or, simply "IFN-y" shall comprehend both the [Arg"*- ] 
and [Gln^^°] forms.) 

10 The above-noted wide variations in biological 

activities of various interferon types makes the construe 
tion of synthetic polypeptide 'analogs of the interferons 
of paramount significance to the full development 
of the therapeutic potential of this class of compounds. 

15 Despite the advantages in isolation of quantities 

of interferons which have been provided by recombinant 
DNA techniques to date, practitioners in this field 
have not been able to address the matter of prepa- 
ration of synthetic polypeptide analogs of the inter- 

20 ferons with any significant degree of success. 

Put another way, the work of Gray, et al.f 
supra , in the isolation of a gene coding for IFN-y 
and the extensive labors of Edge, et al./ supra, in 
. providing a wholly manufactured iFN-a^^ gene provide 

25 only genetic materials for expression of single, very 
precisely defined, polypeptide sequences. There exist 
no procedures (except, possibly, for site specific 
mutagenesis) which would permit microbial expression 
of large quantities of human IFN-y analogs which dif- 

30 fered from the "authentic" polypeptide in terms of 

the identity or location of even a single amino acid. 
In a like manner, preparation of an IFN-a^^ analog 
which differed by one amino acid from the polypeptide 
prepared by Edge, et al., supra , would appear to require 

35 an additional year of labor in constructing a whole 
new gene which varied in terms of a single triplet 
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codon. No means is readily available for the excision 
of a fragment of the subject gene and replacement 
with a fragment including the coding information for 
a variant polypeptide sequence. Further^ modification 
5 of the reported cDNA-derived and manufactured DNA 
sequences to vary codon usage is not an available 
"option". 

Indeed, the only report of the preparation 
of variant interferon polypeptide species by recombinant 

10 DNA techniques has been in the context of preparation 
and expression of "hybrids" of human genes for IFN-a^ 
and IPN-Oj [Week, et al.'. Nucleic Acids Research , 
9, pp. 6153--6168 (1981) and Streuli, et al., Proc. 
Nat> Acad, Sci, U,S>A ., 2§., PP- 2848-2852 (1981)]. 

15 The hybrids obtained consisted of the four possible 
combinations of gene fragments developed upon finding 
that two of the eight human (cDNA-der ived) genes for- 
tuitously included only once within the sequence, 
base sequences corresponding to the restriction endo- 

20 nuclease cleavage sites for the bacterial endonucleas.es, 
PvuII and Bglll. 

There exists, therefore, a substantial need 
in the art for mor^ efficient procedures for the total 
. synthesis from nucleotide bases of manufactured DNA 

25 sequences coding for large polypeptides such as the 
interferons'. There additionally exists a need for 
synthetic methods which will allow for the rapid construe 
tion of variant forms of synthetic sequences such 
as will permit the microbial expression of synthetic 

30 polypeptides which vary from naturally occurring forms 
in terms of the identity and/or position of one or 
more selected amino acids. 



35 
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BRIEF SUMMARY 

The present invention provides novel/ rapid 
and highly efficient procedures for the total synthesis 
5 of linear, double stranded DNA sequences in excess 
of about 200 nucleotide base pairs in length, which 
sequences may comprise entire structural genes capable 
of directing the synthesis of a wide variety of polypep- 
tides of interest* 

10 According to the invention, linear, double 

stranded DNA sequences of a length in excess of about 
200 base pairs and coding for expression of a predeter- 
mined continuous sequence of amino acids within a 
selected host microorganism transformed by a selected . 

15 DNA vector including the sequence, are synthesized 
by a method comprising: 

(a) preparing two or more different, subunit, 
linear, double stranded DNA sequences of about 100 
or more base pairs in length for assembly in a selected 

20 assembly vector, 

each different subunit DNA sequence prepared 
comprising a series of nucleotide base codons coding 
for a different continuous portion of said predetermined 
• sequence of amino acids to be expressed, 

25 one terminal region of a first of said sub- 

units comprising a portion of a base sequence which 
provides a recognition site for cleavage by a first 
restriction endonuclease , which recognition site is 
entirely present either once or not at all in said 

30 selected assembly vector upon insertion of the subunit 
therein, 

one terminal region of a second of said 
subunits comprising a portion of a base sequence which 
provides a recognition site for cleavage by a second 
35 restriction endonuclease other than said first endo- 
nuclease, which recognition site is entirely present 
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once or not at all in said selected assembly vector 
upon insertion of the subunit therein, 

at least one-half of all remaining terminal 
regions of subunits comprising a portion of a recogni- 
5 tion site (preferably a palindromic six base recognition 
site) for cleavage by a restriction endonuclease other 
than said first and second endonucleases r which recogni- 
tion site is entirely present once and only once in 
said selected assembly vector after insertion of all 

10 subunits thereinto; and 

(b) serially inserting each of said subunit 
DNA sequences prepared in step (a) into the selected 
assembly vector and. effecting the biological amplifica- 
tion of the assembly vector subsequent to each insertion^ 

15 thereby to form a DNA vector including the desired 
DNA sequence coding for the predetermined continuous 
amino acid sequence and wherein. the desired DNA sequence 
assembled includes at least one unique, preferably 
palindromic six base, recognition site for restriction 

20 endonuclease cleavage at an intermediate position 
therein. 

The above general method preferably further 
includes the step of isolating the desired DNA sequence 
• from the assembly vector preferably to provide one 

25 of the class of novel manufactured DNA sequences having 
at least one unique palindromic six base recognition 
site for restriction endonuclease cleavage at an inter- 
mediate position therein, A sequence so isolated 
may then be inserted in a different, •'expression" 

30 vector and direct expression of the desired polypeptide 
by a microorganism which is the same as or differ- 
ent from that in which the assembly vector is amplified. 
In other preferred embodiments of the method: at 
least three different subunit DNA sequences are prepared 

35 in step (a) and serially inserted into said selected 

assembly vector in step (b) and the desired manufactured 
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DNA sequence obtained includes at least two unique 
palindromic six base recognition sites for restriction 
endonuclease cleavage at intermediate positions therein; 
the DNA sequence synthesized comprises an entire struc- 
5 tural gene coding for a biologically active polypeptide; 
and/ in the DNA sequence manufactured , the sequence 
of nucleotide bases includes one or more codons selected^ 
from among alternative codons specifying the same 
amino acid/ on the basis of preferential expression 

10 characteristics of the codon in said selected host 
microorganism. . . 

Novel products of the invention include 
manufactured, linear., double stranded DNA sequences 
of a length in excess of about 200 base pairs and 

15 coding for the expression of a predetermined continuous 
sequence of amino acids by a selected host microorganism 
transformed with a selected DNA vector including- the 
sequence, characterized by having at least one unique 
palindromic six base recognition site for restriction 

20 endonuclease cleavage at an intermediate position 
therein. Also included are polypeptide products of 
the expression by an organism of such manufactured 
sequences . 

Illustratively provided by the present inven- 
25 tion are novel manufactured genes coding for the syn- 
thesis of human immune interferon (IFN-y) and novel 
biologically functional analog polypeptides which 
differ from human immune interferon in terms of the 
identity and/or location of one or more amino acids. 
30 Also' provided are manufactured genes coding for synthe- 
sis of human leukocyte interferon of the P subtype 
("LelFN-F" or "IFN-aF") and analogs thereof, along 
with consensus human leukocyte interferons. 

DNA subunit sequences for use in practice 
35 of the methods of the invention are preferably synthe- 
sized from nucleotide bases according to the methods 



OMPI 
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disclosed in co-owned, concurrently-filed U.S. Patent 
Application Serial No. 375,493, by Yitzhak Stabinsky, 
entitled "Manufacture and Expression of Structural 
Genes" (Attorney's Docket No. 6250). Briefly sunmarized 
5 the general method comprises the steps ofs 

(1) preparing two or more different, linear, 
duplex DNA strands, each duplex strand including a 
double stranded region of 12 or more selected complemen- 

■10 tary base pairs and further including a top single 
stranded terminal sequence of from 3 to 7 selected 
bases at one end of the strand and/or a bottom single 
stranded terminal sequence of from 3 to 7 selected 
bases at the other end of the strand, each single 

15 stranded terminal sequence of each duplex DNA strand 
comprising the entire base complement of at most one 
single stranded terminal sequence of any other duplex 
DNA strand prepared; and 

(2) annealing each duplex DNA strand prepared 
20 in step (1) to one or two different duplex strands 

prepared in step (1) having a complementary single 
stranded terminal sequence, thereby to form a single 
continuous double stranded Dia sequence which has 
a duplex region of at least 27 selected base pairs 

25 including at least 3 base pairs formed by complementary 
association of single stranded terminal sequences 
of duplex DNA strands prepared in step (1) and which 
has from 0 to 2 single stranded top or bottom terminal 
regions of from 3 to 7 bases. 

30 In the preferred general process for subunit 

manufacture, at least three different duplex DNA strands 
are prepared in step (1) and all strands so prepared 
are annealed concurrently in a single annealing reaction 
mixture to form a single continuous double stranded 

35 DNA sequence which has a duplex region of at least 
42 selected base pairs including at least two non- 
adjacent sets of 3 or more base pairs formed by comple- 
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mentary association of single stranded terminal sequen- 
ces of duplex strands prepared in step (1) • 

The duplex DNA strand preparation step (1) 
of the preferred subunit manufacturing process prefer- 
5 ably comprises the steps of; 

(a) constructing first and second linear ^ 
deoxyoligonucleotide segments having 15 or more bases 
in a selected linear sequence / the linear sequence 
of bases of the second segment comprising the total . . 

10 complement of the sequence of bases of the first segment 
except that at least one end of. the second segment 
shall either include an additional linear sequence 
of from 3 to 7 selected bases beyond those fully comple- 
menting the first segment, or shall lack a linear 

15 sequence of from 3 to 7 bases complementary to a terminal 
sequence of the first segment, provided, however, 
that the second segment shall not have an additional 
sequence of bases or be lacking a sequence of bases 
at both of its ends; and, 

20 (b) combining the first and second segments 

under conditions conducive to complementary association 
between segments to form a linear, duplex DNA strand. 

The sequence of bases in the double stranded 
•DNA subunit sequences formed preferably includes one 

25 or more triplet codons selected from among alternative 
codons specifying the same amino acid on the basis 
of preferential expression characteristics of the 
codon in a projected host microorganism, such as yeast 
cells or bacteria, especially E. coli bacteria. 

30 Also provided by the present invention are 

improvements in methods and materials for enhancing 
levels of expression of selected exogenous genes in 
E, coli host cells. Briefly stated, expression vectors 
are constructed to include selected DNA sequences 

35 upstream of polypeptide coding regions which selected 
sequences are duplicative of ribosome binding site 
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sequences extant in genomic E-_ coli DNA associated 
with highly expressed endogenous pol^eptides. A 
presently preferred selected sequence is duplicative 
of the ribosome binding site sequence associated with 
5 coli expression of outer membrane protein P ("OMP-F"). 

Other aspects and advantages of the present 
invention will be apparent upon consideration of the 
following detailed description thereof. 

10 DETAILED DESCRIPTION 

As employed herein, .'the term "manufactured" 
as applied to a DNA sequence or gene shall designate 
a product either totally chemically synthesized by 

15 assembly of nucleotide bases or derived from the biologi- 
cal replication of a product thus chemically synthe- 
sized. As such, the term is exclusive of products 
"synthesized" by cDNA methods or genomic cloning method- 
ologies which involve starting materials which are 

20 of biological origin. Table I below sets out abbrevia- 
tions employed herein to designate amino acids and 
includes lUPAC-recommended single letter designations. 



30 



35 
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TABLE I 

Amino Acid Abbreviation lUPAC Symbol 



5 


Alanine 


Ala 


A 




Cyste ine 


Cys 


C 




Aspartic acid 


Asp 


D 




Glutamic acid 


Glu 


E 




Phenylalanine 


Phe 


P 


10 


Glycine 


Gly 


G 




Histidine 


His 


H 




Isoleucine 


lie 


I 




Lysine 


Lys 


K 




Leucine 


Leu 


L 


15 


Methionine 


Met 


M 




As par ag ine 


Asn 


N 




Proline 


Pro 


P 




Glutamine 


Gin 


Q 




Ar gin ine 


Arg 


R 


20 


Serine 


Ser 


S 




Threonine 


Thr 


T 




Valine 


Val 


V 




Tryptophan 


Trp 


W 




.Tyrosine 


Tyr 


Y 



25 

The following abbreviations shall be employed 
for nucleotide bases: A for adenine; G for guanine; 
T for thymine; U for uracil; and C for cytosine* 
For ease of understanding of the present invention r 

30 Table II and II below provide tabular correlations 

between the 64 alternate triplet nucleotide base codons 
of DNA.and the 20 amino acids and transcription termina- 
tion ("stop") functions specified thereby. In order 
to determine the corresponding correlations for RNA, 

35 n is substituted for T in the tables. 
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10 



15 



20 



25 • 



FIRST 

JrVJO X 1. ± VJLN 




SECOND 


POSITION 




THIRD 
POSITION 




T 


C 


A 


G 




T 


Phe 
Phe 
Leu 
Leu 


Ser 
Ser 
Ser 
Ser 


Tyr 
Tyr 
Stop 
Stop 


Cys 
Cys 
Stop 
Trp 


T 
C 
A 
6 


C 


Leu 
Leu 
Leu 
Leu 


Pro 
Pro 
Pro 
Pro 


His . . 
His 
Gin 
Gin 


Arg 
Arg 
Arg 
Arg 


T 
C 
A 
G 


A 


lie 
He 
lie 
Met 


Thr 
Tljr 
Thr 
Thr 


Asn 
Asn 
Lys 
Lys • 


Ser 
Ser 
Arg 
Arg 


T 
C 
A 
G 


G 


Val 
Val 
Val 
Val 


Ala 
Ala 
Ala 
Ala 


Asp 
Asp 
Glu 

Glu 


Gly 
Gly 
Gly 
Gly 


T 
C 
A 
G 



35 



Af-^ wipo 
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TABLE III 



Amino Acid 



Specifying Codon(s) 



5 


(A) 


Alanine 


• GCT, 


GCC, 


GGA, 


GGG 








(C) 


Cysteine 


TGT, 


TGC 












(D) 


Aspartic acid 


GAT, 


GAG 












(E) 


Glutamic acid 


GAA, 


GAG 












(P) 


Phenylalanine 


TTT, 


TTG 










lb 


(G) 


Glycine 


GGT, 


GGG, 


GGA, 


GGG 








(H) 


Histidine 


CAT, 


GAG 












(I) 


Isoleucine 


ATT, 


ATG, 


"ATA 


- 








(K) 


Lysine 


AAA, 


AAG 












(L) 


Leucine 


TTA, 


TTG, 


GTT, 


GTG, 


GTA, 


GTG 


IS 


(M) 


Methionine 


ATG 














(N) 


Asparagine 


AAT, 


AAG 












(P) 


Proline 


CCT, 


CCC, 


CCA, 


GGG 








(Q) 


Glut amine 


CAA, 


GAG 












(R) 


Arginine 


CGT, 


GGG, 


GGA, 


CGG, 


AGA, 


AGG 


20 


(S) 


Ser ine 


TCT, 


TGG, 


TGA, 


TCG, 


AGT, 


AGG 




(T) 


Threonine 


ACT, 


AGG, 


AC A, 


AGG 








(V) 


Valine 


GTT, 


GTG ( 


STA, ( 


STG 








m 


Tryptophan 


TG6 














(Y) 


Tyrosine 


TAG, 


TAT 










25 


STOP 




TAA, 


TAG, 


TGA 









A "palindromic" recognition site for restric- 
tion endonuclease cleavage of double stra^ided DNA 
is one which displays "left-to-right and right-to- 

30 left" symmetry between top and bottom base complements, 
i.e., where "readings" of complementary base sequences 
of the recognition site from 5* to 3' ends are identical. 
Examples of palindromic six base recognition sites 
for restriction endonuclease cleavage include the 

35 sites for cleavage by Hindlll wherein top and bottom 

strands read from 5' to 3' as AAGCTT. A non-palindromic 
six base restriction site is exemplified by the site 
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for cleavage by EcoPlS, the top strand of which repor- 
tedly reads CAGCAG. The bottom strand base complement, 
when read 5* to 3' is CTGCTG. Essentially by definition, 
restriction sites comprising odd numbers of bases 
5 (e.g., 5, 7) are non-pal indromic. Certain endonucleases 
wili cleave at variant forms of a site, which may 
be palindromic or not. For example, XhoII will recog- 
nize a site which reads (any purine )GATC (any pyrimidine) 
including the palindromic sequence AGATCT and the 

10 non-palindromic sequence GGATCT. Referring to' the 

previously-noted "BRL Restriction Endonuclease Reference 
Chart," endonucleases recognizing six base palindromic - 
sites exclusively include BbrI, Chul^ Hinl73, Ein91R, 
Hinblll, Hinblll, Hindlll, HinfIX, Hsul, Bglll , StuI, 

15 Rrul, Clal, Avalll, PvuII, Smal, Xmal, EccI, SacII, 
Sbol, SbrI, Shyl, Sstll, Tgll, Avrll, Pvul, RshI, 
Rspl, Xnil, Xorll, Xmalll, Blul, Msil, Scul, Sexl, 
Sgol, Slal, Slul, Spal, Xhol, Xpal, Bcel70, Bsul247, 
PstI, SalPI, Xmall, Xorl, EcoRI, Rsb630I, Sad, SstI, . 

20 SphI, BamHI, BamKI, BamNI, BamPl, BstI, Kpnl, Sail, 
XamI, Hpal, Xbal, AtuCI, Bell, Cpel, SstIV, AosI, 
MstI, Ball, AsuII, and Mlal. Endonucleases which 
recognize only non-palindromic six base sequences 
exclusively include Tthlllll, EcoPlS, Aval, and Avrl. 

25 Endonucleases recognizing both palindromic and non- 
palindromic six base sequences include Hael, HgiAI, 
Acyl, AOS II, AsuIII, Acci, ChuII, Hindi, Hindll, 
Mnni, Xholl, Haell, HinHI, Ngol, and EcoRI». 

upon determination of the structure of a 

30 desire<3 polypeptide to be produced , practice of the 
present invention involves: preparation of two or 
more different specific, continuous double stranded 
DNA subunit sequences of 100 or more base pairs in 
length and having terminal portions of the proper 

35 configuration; serial insertion of subunits into a 

selected assembly vector with intermediate amplifica- 
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tion of the hybrid vectors in a selected host organism; 
use of the assembly vector (or an alternate selected 
"expression" vector including the DNA sequence which 
has been manufactured from the subunits) to transform * 
5 a suitable r selected host; and, isolating polypeptide 
sequences expressed in the host organism. In its 
most efficient forms / practice of the invention involves 
using the same vector for assembly of the manufactured 
sequence and for large scale expression of the polypep- 
10 tide. Similarly r the host microorganism employed 

for expression will ordinarily be the same as employed 
for amplifications performed during the subunit assembly 
process . 

The manufactured DNA sequence may be provided 
15 with a promoter/regulator region for autonomous control 
of expression or may be incorporated into a vector 
in a manner providing for control of expression by 
a promoter/regulator sequence extant in the vector. 
Manufactured DNA sequences of the invention may suitably 
20 be incorporated into existing plasmid-borne genes 
(e.g.r 3-galactosidase) to form fusion genes coding 
for fusion polypeptide products including the desired 
amino acid sequences coded for by the' manufactured 
DNA sequences. 

25 In practice of the invention in its preferred 

forms, polypeptides produced may vary in size from 
about 65 or 70 amino acids up to about 200 or more 
amino acids. High levels of expression of the desired 
polypeptide by selected transformed host organisms 

30 is facilitated through the manufacture of DNA sequences * 
which include one or more alternative codons which 
are preferentially expressed by the host. 



Manufacture of double stranded subunit DNA 



35 



sequences of 100 to 200 base pairs in length may proceed 
according to prior art assembly methods previously 
referred to, but is preferably accomplished by means 
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of the rapid and efficient procedures disclosed in 
the aforementioned U.S. Application S.N. 375,493 by 
Stabinsky and used in certain of the following 
examples of actual practice of the present invention. 
5 Briefly put/ these procedures involve the assembly 
from deoxyoligonucleotides of two or more different , 
linear, duplex DNA strands each including a relatively 
long double stranded region along with a relatively 
short single stranded region on one or both opposing 

10 ends of the double strand. The double stranded regions 
are designed to include codons needed to specify assembly 
of an initial, or terminal or 'intermediate portion 
of the total amino acid sequence of the desired polypep- 
tide. Where possible, alternative codons preferentially 

15 expressed by a projected host (e.g., E. coli ) are 
employed. Depending on the relative position to be 
assumed in the finally assembled subunit DNA sequence, 
the single stranded region (s) of the duplex strands 
will include a sequence of bases which ^ when complemented 

20 by bases of other duplex strands, also provide codons 
specifying amino acids within the desired polypeptide 
sequence • 

Duplex strands formed according to this 
procedure are then enzymatically annealed to the one 

25 or two different duplex strands having complementary 
short, single stranded regions to form a desired con- 
tinuous double stranded subunit DNA sequence which 
codes for the desired polypeptide fragment. 

High efficiencies and rapidity in total 

30 sequence assembly are augmented in such procedures 
by performing a single annealing reaction involving 
three or more duplex strands, the short, single stranded 
regions of which constitute the base complement of 
at most one other single stranded region of any other 

35 duplex strand. Providing. all duplex strands formed 
with short single stranded regions which uniquely 
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complement only one of the single stranded regions 
of any other duplex is accomplished by alternative 
codon selection within the context of genetic code 
redundancy f and preferably also in the context of 
5 codon preferences of the projected host organism. 

The following description of the manufacture 
of a hypothetical long DNA sequence coding for a hypo- 
thetical polypeptide will serve to graphically illus- 
trate practice of the invention, especially in the 

10 context of formation of proper terminal sequences 
on suburtit DNA sequences. 

A biologically active -polypeptide of interest 
is isolated and its amino acids are sequenced to reveal 
a constitution of 100 amino acid residues in a given 

15 continuous sequence. Formation of a manufactured 

gene for microbial expression of the polypeptide will 
thus require assembly of at least 300 base pairs for 
insertion into a selected viral or circular plasmid 
DNA vector to be used for transformation of a selected 

20 host organism. 

A preliminary consideration in construction 
of the manufactured gene is the identity of the projected 
microbial host, because foreknowledge of the host 
allows for codon selection in the context of codon 

25 preferences of the host species. For purposes of 

this discussion, the selection of an E. coli bacterial 
host is posited. 

A second consideration in construction of 
the manufactured gene is the identity of the projected 

30 DNA vector employed in the assembly process. Selection 
of a suitable vector is based on existing knowledge 
of sites for cleavage of the vector by restriction 
endonuclease enzymes. More particularly, the assembly 
vector is selected on the basis of including DNA secjuen- 

35 ces providing endonuclease cleavage sites which will 

permit easy insertion of the subunits. In this regard. 
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the assembly vector selected preferably has at least 
two restriction sites which occur only once (i.e., 
are "unique") in the vector prior to performance of 
any subunit insertion processes. For the purposes 
5 of this description, the selection of a hypothetical 
circular DNA plasmid pBR 3000 having a single EcoRI 
restriction site, i.e., ZcotAAG-' ^ single PvuII 

restriction site, i.e., IgtcGAC-' posited. 

10 The amino acid sequence of the desired polypep- 

tide is then analyzed in the context of determining 
availability of alternate codons -for given amino acids 
(preferably in the context of codon preferences of 
the projected E. coli host). With this information 

15 in hand, two subunit DNA sequences are designed, prefer- 
ably having a length on the order of about 150 base 
pairs — : each coding for approximately one-half of 
the total amino acid sequences of the desired polypep- 
tide. For purposes of this description, the two sub- 

20 units manufactured will be referred to as "A" and 
"B". 

The methods of the present invention as 
applied to two such subunits, generally call for: 
.insertion of one of the subunits into the assembly 

25 vector; amplification of the hybrid vector formed; 
and insertion of the second subunit to form a second 
hybrid including the assembled subunits in the proper 
sequence. Because the method involves joining the 
two subunits together in a manner permitting the joined 

30 ends to provide a continuous preselected sequence 

of bases coding for a continuous preselected sequence 
of amino acids, there exist certain requirements concern- 
ing the identity and sequence of the bases which make 
up the terminal regions of the manufactured subunits 

35 which will be joined to another subunit. Because 

the method calls for joining subunits to the assembly 



OMPI 



wo 83/04053 



PCT/US83/00605 



- 32 - 

vector^ there exist other requirements concerning 
the identity and sequence of the bases which make 
up those terminal regions of the manufactured subunits 
which will be joined to the assembly vector. Because 
5 the subunits are serially, . rather than concurrently, 
inserted into the assembly vector (and because the 
methods are most beneficially practiced when the subunits 
can be selectively excised from assembled form to 
allow for alterations in selected base sequences therein) , 

10 still further requirements exist concerning the identity 
of the bases in terminal regions of subunits manufactured. 
For ease of understanding in- the- following discussion 
of terminal region characteristics, the opposing ter- 
minal regions of subunits A and B are respectively 

15 referred to as A-1 and A-2, and B-1 and B-2, viz: 

B-2 B::! A::2 a^i 

B A 



20 Assume- that an assembly strategy is developed 

wherein subunit A is to be inserted into pBR3000 first, 
with terminal region A-1 to be ligated to the vector 
at the EcoRI restriction site • In the simplest case, 
. the terminal region is simply provided with an EcoRI 

25 "sticky end", i.e., a single strand of four bases 

(-AATT- or -TTAA-) which will complement a single 

stranded sequence formed upon EcoRI digestion of pBRSOOO. 

This will allow ligation of terminal region A-1 to 

the vector upon treatment with ligase enzyme. Unless 

30 the single strand at the end of terminal region A-1 

5 ' — G— 

is preceded by an appropriate base pair (e.g., 31-cTTAA-^' 
the entire recognition site will not be reconstituted 
upon ligation to the vector. Whether or not the EcoRI 
recognition site is reconstituted upon ligation (i.e./ 
35 whether or not there will be 0 or 1 EcoRI sites remain- 
ing after insertion of subunit A into the vector) 
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is at the option of the designer of the strategy. 
Alternatively, one may construct the terminal region 
A-1 of subunit A to include a complete set of base 
pairs providing a recognition site for some other endo- 
5 nuclease, hypothetically designated "XXX", and then 
IP add on portions of the EcoRI recognition site as above 

to provide an EcoRI "linker". To be of practical 
use in excising subunit A from an assembled sequence, 
the "XXX" site should not appear elsewhere in the 

10 hybrid plasmid formed upon insertion. The requirement 
for construction of terminal jregion A-1 is, therefore, 
that it comprise a portion (i.e.^ all or part) of 
a base sequence which provides a recognition site 
for cleavage by a restriction endonuclease, which 

15 recognition site is entirely present either once or 
not at all in the assembly vector upon insertion of 
the subunit. 

Assume that terminal region B-2 of subunit 
B is also to be joined to the assembly vector (e.g., 

20 at the single recognition site for PvuII cleavage 

present on pBR3000) . The requirements for construction 
of terminal region B-2 are the same as for construction 
of A-1, except that the second endonuclease enzyme 
•in reference to which the construction of B-2 is made 

25 must be different from that with respect to which 

the construction of A-1 is made. If recognition sites 
are the same, one will not be able to separately excise 
' segments A and B from the fully assembled sequence. 

The above assumptions require, then, that 
^ ' 30 terminal region A-2 is to be ligated to terminal region 
B-1 in the final pBR3a00 hybrid. Either the terminal 
region A-2 or the terminal region B-1 is constructed 
to comprise a portion of a (preferably palindromic 
six base) recognition site for restriction endonuclease 

35 cleavage by hypothetical third endonuclease "yyy" 

which recognition site will be entirely present once 
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and only once in the expression vector upon insertion 
of all subunits thereinto, i^e*^ at an intermediate 
position in the assemblage of subunits. There exist 
a number of strategies for obtaining this result* 
5 In one alternative strategy, the entire- recognition 
site of "YYY" is contained in terminal region A-2 
and the region additionally includes the one or more 
portions of other recognition sites for endonuclease 
cleavage needed to (1) complete the insertion o£ subunit 

10 A into the assembly vector for amplification purposes, 
and (2) allow for subsequent joining of subunit A 
to subunit B. In this case, terminal region B-1 would 
have at its end only the bases necessary to link it 
to terminal region A-2* In another alternative , the 

15 entire "YYY" recognition site is included in terminal 

region B-1 and B-1 further includes at its end a portion 
of a recognition site for endonuclease cleavage which 
is useful for joining subunit A to subunit B, 

As another alternative, terminal region 

20 B-1 may contain at its end a portion of the "yyy" 
recognition site* Terminal region A-2 would then 
contain the entire "yyy" recognition site plus, at 
its end, a suitable "linker" for joining A-2 to the 
• assembly vector prior to amplification of subunit 

25 A Ce,g,, a Pvull "sticky end").. After amplification 
of the hybrid containing subunit A, the hybrid would 
be cleaved with "YYY" (leaving a sticky-ended portion 
of the "YYY" recognition site exposed on the end of 
A-2) and subunit B could be inserted with its B-1 

30 terminal xegion joined with the end of terminal region 
A-2 to reconstitute the entire "YYY" recognition site. 
The requirement for construction of the terminal regions 
of all segments (other than A-1 and B-2) is that one 
or the other or both (i.e., "at least half") comprise 

35 a portion (i.e., include all or part) of a recognition 
site for third restriction endonuclease cleavage. 
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which recognition site is entirely present once and 
only once (i.e., is "unique") in said assembly vector 
after insertion of all subunits thereinto. To generate 
a member of the class of novel DNA sequences of the 
5 invention, the recognition site of the third endonuclease 
should be a six base palindromic recognition site. 

While a subunit "terminal region" as referred 
to above could be considered to extend from the subunit 
end fully halfway along the subunit to its center , 

10 as a practical matter the constructions noted would 
ordinarily be performed in the final 10 or 20 bases. 
Similarly/ while the unique "intermediate" recognition 
site in the two subunit assemblage may be up to three 
times closer to one end of the manufactured sequence 

15 than it is to the other, it will ordinarily be located 
near the center of the sequence. If, in the above 
description r a synthetic plan was generated calling 
for preparation of three subunits to be joined , the 
manufactured gene would include two unique restriction 

20 enzyme cleavage sites in intermediate positions at 
least one of which will have a palindromic six base 
recognition site in the class of new DNA sequences 
of the invention. 

The significant advantages of the above- 

25 described process are manifest. Because the manufac- 
tured gene now includes one or more unique restriction 
endonuclease cleavage sites at intermediate positions 
along its length, modifications in the codon sequence 
of the two subunits joined at the cleavage site may 

30 be effected with great facility and without the need 
to re-synthesize the entire manufactured gene. 

Following are illustrative examples of the 
actual practice of the invention in formation of manu- 
factured genes capable of directing the synthesis 

35 of: human immune interferon (IFNy) and analogs thereof; 
human leukocyte interferon of the F subtype (INF-aF) 
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and analogs thereof; andr multiple consensus leukocyte 
interferons which/ due to homology to IFN-aF can be 
named as IFN-aP analogs. It will be apparent from 
these ' examples that the gene manufacturing method- 
5 ology of the present invention provides an overall 
synthetic strategy for the truly rapid ,^ efficient 
synthesis and expression of genes of a length in excess 
of 200 base pairs within a highly flexible framework 
allowing for variations in the structures of products 
10 to be expressed which has not heretofore been available 
to investigators practicing recombinant DNA techniques. 

EXAMPLE 1 



15 In the procedure for construction of synthetic 

genes for expression of human IFNy a first selection 
made was the choice of E, coli as a microbial host 
for eventual expression of the desired polypeptides. 
Thereafter, codon selection procedures were carried 

20 out in the context of E, coli codon preferences enumer- 
ated in the Grantham publications r supra . A second 
selection made was the choice of pBR322 as an expression 
vector and, significantly, as the assembly vector 
.to be employed in amplification of subunit sequences. 

25 In regard to the latter factor, the plasmid was selected 
with the knowledge that it included single BamHI, 
Hindlll, and Sail restriction sites. With these restric- 
tion sites and the known sequence of amino acids in * 
human immune interferon in mind, a general plan for 

30 formation of three "major" subunit DNA s^equences {IP-3, ? 
IP-2 and IF-1) and one "minor" subunit DNA sequence 
(IF-4) was evolved.. This plan is illustrated by Table 
IV below. 
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The "minor" sequence (IP-4) is seen to include 
codons for the 4th through 1st (5'-TGT TAG TGC GAG) 
amino acids and an ATG codon for an initiating methio- 
nine [Met"^]. in this construction, it also includes 
5 additional bases to provide a portion of a control 
involved in an expression vector assembly from pBR 
322 as described infra . 

Alternative form of subunit IFN-1 for use 

140 

in synthesis of a manufactured gene for [Arg lIFNy 
10 included the codon 5 ' -CGT in place of 5 '-GAG (for 

[Gln^^°]). at the codon site specifying the 140th amino 
acid. 

The codon sequence plan for the top strand 
of the polypeptide-specifying portion total DNA sequence 
15 synthesized was as follows: 

5 » -TGT-TAC-TGC-CAG-GAT-CCG-TAC-GTT-AAG-GAA-GCA-GAA- 

AAG-CTG-AAA-AAA-TAC-TTC-AAG-GCA-GGC-GAC-TCC-GAC-GTA- 

GCT-GAT-AAG-GGG-ACC-CTG-TTC-CTG-GGT-ATC-CTA-AAA-AAG- 

20 TG6-AAA-GAG-GAA-TGC-GAG-CTG-AAG-ATC-ATG-GAG-TCT-GAA- 
ATT-GTA-AGC-TTG-TAC-TTC-AAA-GTG-TTG-AAG-AAG-TTC-AAA- 
GAC-GAT-CAA-TCC-ATC-CAG-AAG-AGG-GTA-GAA-AGT-ATT-AAG- 
GAG-GAC-ATG-AAC-GTA-AAA-TCC-TTT-AAC-AGC-AAC-AAG-AAG- 
AAA-CGC-GAT-GAC-TTC-GAG-AAA-CTG-ACT-AAC-TAC-TCT-GTT- 

25 AGA-GAT-GTG-AAG-GTG-CAG-CGT-AAA-GCT-ATT-CAC-GAA-CTG- 
ATC-CAA-GTT-ATG-GCT-GAA-CTG-TCT-CCT-GCG-GCA-AAG-AGT- 
GGG-AAA-GGC-AAG-CGT-AGC-CAG-ATG-CTG-TTT-CAG- [or CGT] - 
CGT-GGG-CGT-GGT-TCT-CAG . 

30 In the above sequence, the control sequence 

bases and the initial methionine-specif ying codon 
is not illustrated, nor are termination sequences 
or sequences providing a terminal Sail restriction 
site. Vertical lines separate top strand portions 

35 attributable to each of the subunit sequences. 
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The following example illustrates a preferred 
general procedure for preparation of deoxyoligonucleo- 
tides for use in the manufacture of DNA sequences 
of the invention. . > • 

5 . • • - 

EXAMPLE 2 

Oligonucleotide fragments were synthesized 
using a four-step procedure and several intermediate 

10 washes. Polymer bound dimethoxytrityl protected nucleor 
side in a sintered glass funnel was first stripped 
of its 5 '-protecting group (dimethoxytrityl) using 
3% trichloroacetic acid in dichloromethane for 1-1/2 
minutes. The polymer was then washed with methanol r 

15 tetrahydrofuran and acetonitrile. The washed polymer 
was then rinsed with dry acetonitrile, placed under 
argon and then treated in the condensation step as 
follows. 0.5 ml of a solution of 10 mg tetrazole 
in acetonitile was added to the reaction vessel contain- 

20 ing polymer. Then 0.5 ml of 30 mg protected nucleoside 
phosphor ami dite in acetronitrile was added. This 
reaction was agitated and allowed to react for 2 "minutes. 
The reactants were then removed by suction and the 
polymer rinsed with acetonitrile. This was followed 

25 by the oxidation step wherein 1 ml of a solution contain- 
ing 0.1 molar I2 in 2-6-lutidine/H20/THPr 1:2:2, was 
reacted with the polymer bound oligonucleotide chain 
for 2 minutes. Following a THF rinse capping was 
done using a solution of dimethylaminopyridine (6.5 g 

30 in 100 ml THF) and acetic anhydride in the proportion 
4:1 for 2 minutes. This was followed by a methanol 
rinse and a THF rinse. Then the cycle began again 
with a trichloroacetic acid in CH2CI2 treatment. 
The cycle was repeated until the desired oligonucleotide 

35 sequence was obtained. 
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The final oligonucleotide chain was treated 
with thiophenol dioxane, triethylamine 1:2:2, for 
45 minutes at room temperature. Then, after rinsing 
with dioxane, methanol, and diethylether , the oligonucleo- 
5 tide was cleaved from the polymer with concentrated 

ammonium hydroxide at room temperature. After decanting 
the solution from the polymer, the concentrated ammonium 
hydroxide solution was heated at 60*C for 16 hours 
in a sealed tube. 

10 Each oligonucleotide solution was then extrac- 

ted four. times with 1-butanol. The solution was loaded 
into a 20% polyacrylamide 7 molar urea electrophoresis 
gel and, after running, the appropriate product DNA 
band was isolated. 

15 Subunits were then assembled from deoxyoligo- 

nucleotides according to the general procedure for 
assembly of subunit IF-1. 

Following the isolation of the desired 14 
DNA segments, subunit IF-1 was constructed in the 

20 following manner : 

1. One nanomole of each of the DNA fragments 
excluding segment 13 and segment 2 which contain 5' 
cohesive ends, were subjected to 5 '-phosphorylation ; 

2. The complementary strands of DNA, segments 
25 13 and 14, 11 and 12, 9 and 10, 7 and 8, 5 and 6, 

3 and 4 and 1 and 2 were combined together, warmed 
to 90' and slowly cooled to 25*; 

3. The resulting annealed pairs of DNA 

were combined sequentially and warmed to 37° and slowly 

30 cooled to 25°; 

4. The concentration of ATP and DTT in 

the final tube containing segments 1 thru 14 was adjusted 
to 150 ]iK and 18 mM respectively. Twenty units of 
T-4 DNA ligase was added to this solution and the 
35 reaction was incubated at 4° for 18 hrs; 
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5. The resulting crude product was heated 
to 90^ for 2 min. and subjected to gel filtration 
on Sephadex G50/40 using 10 mM triethyl ammonium bicar- 
bonate as the elueht;*' 
5 6.' The 'desired product was purified,^ follow- 

ing 5' phosphorylation, using an 8% polyacrylamide-TBE 
gel. 

Subunits IP-2, IP-3 and IF-4 were constructed 
in a similar manner. 

10 The following example relates to: assembly 

of the complete human immune interferon gene from 
subunits IP-l, IF-2, IF-3/ and*IF-4; procedures for 
the growing, under appropriate nutrient conditions, 
of transformed E. coli cells, the isolation of human 

15 immune interferon from the cells, and the testing 
of biological activity of interferon so isolated. 

EXAMPLE 3 



20 The major steps in the general procedure 

for assembly of the complete human IFNy specifying 
genes from subunits IF-1, IF-2, and IF-3 are illustrated 
. in Figure 1. 

The 136 base pair subunit IF-1 was electro- 

25 eluted from the gel, ethanol precipitated and resuspended 
in water at a concentration of 0.05 pmol/iil. Plasmid 
pBR322 (2.0 pmol) was digested with EcoRI and Sail, 
treated with phosphatase, phenol extracted, ethanol 
precipitated, and resuspended in water at a concentra- 

30 tion of 0.1 pmol/iil. Ligation was carried out with 
0.1 pmol of the plasmid and 0.2 pmol of subunit IF-1, 
using T-4 DNA ligase to form hybrid plasmid pINTl. 
E. coli were transformed and multiple copies of pINTl 
were isolated therefrom. 

35 The above procedure was repeated for purposes 

of inserting the 153 base pair subunit IF-2 to form 
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pINF2 except that the plasmid was digested with EcoRI 
and Bglll. The 153 base pair IF-3 subunit was similarly 
inserted into pINT2 during manufacture of pINT3 except 
that EcoRI and Hind III were used to digest the plasmid. 
5 An IP-4 subunit was employed in the construc- 

tion of the final expression vector as follows: Plasmid 
PVvI was purchased from Stanford university, Palo 
Alto, California, and digested with PvuII. Using 
standard procedures, an EcoRI recognition site was 

10 inserted in the plasmid at a PvuII site. Copies of 
this hybrid were then digested with EcoRI and Hpal 
to provide a 245 base pair sequence including a portion 
of the trp promoter /operator region. By standard proce- 
dures, IF-4 was added to the Hpal site in order to 

15 incorporate the remaining 37 base pairs of the complete 
trp translational initiation signal and bases providing 
codons for the initial four amino acids of inanune 
interferon (Cys-Tyr-Cys-Gln) . The resulting assembly . 
was then inserted into pINT3 which had been digested 

20 with EcoRI and BamHI to yield a plasmid designated 

pINTY-trpI7 . 

B. coli cells containing pINTy-trpI? were 
grown on K media in the absence of tryptophan to 
an O.D.gQQ of 1. Indoleacrylic acid was added at 

25 a concentration of 20 jig per ml and the cells were 
cultured for an additional 2 hours at 37 "C. Cells 
were harvested by centrif ugation and the cell pellet 
was resuspended in fetal calf serum buffered with 
HEPES (pH 8.0). Cells were lysed by one passage through 

30 a French press at 10,000 psi. The cell lysate was 

cleared of debris by centrif ugation and the supernatant 
was assayed for antiviral . activity by the CPE assay 
["The Interferon System" Stewart, ed., Springer-Verlag, 
N.y., N.Y. (1981)]. The isolated product of expression 

35 was designated y-1. 
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This example relates to a modification in 
the DNA sequence of plasmid pINTy-trpI? which facili- 
tated the use of . the vector . in the trp promoter -con- 
trolled expressipn. of .structural genes coding for, » 
5 e.g.f analogs of IPN-y and.IFN-aF» : . ... 

EXAMPLE 4 

Segment IF-4, as previously noted, had been 

10 constructed to include bases coding for an initial 
methionine and the first four amino acids of IFN-y 
as well -as 37 base pairs (commencing at its 5' end 
with a Hpal blunt end) which completed at the 3* end 
of a trp promoter/operator sequence, including a Shine 

15 Delgarno ribosome binding sequence • It was clear that 
manipulations involving sequences coding IFN-y analogs 
and for polypeptides other than IFN-y would be facili- 
tated if a restriction site 3' to the entire trp prom- 
oter/operator region could be established. By way 

20 of illustration, sequences corresponding to IF-4 for 
other genes could then be constructed without having 
to reconstruct the entire 37 base pairs needed to 
reconstitute the trp promoter /opera tor and would only 
require bases at the 5* end such as would facilitate. 

25 insertion in the proper reading frame with the complete 
promoter /operator • 

Consistent with this goal, sequence IF-4 
was reconstructed to incorporate an Xbal restriction ^ 
site 3' to the base pairs completing the trp promoter/- 

30 operator. The construction is shown in Table V below. ^ 
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TABLE V 



Hpal 



10 



25 



30 



35 



AA CTA GTA CGC AAG TTC ACG TAA AAA GGG 
TT GAT CAT GCG TTC AAG TGC ATT.TTT CCC 
1 -b ' 



Xbal -1 1 2 3 4 BamHI 
Met Cys Tyr Cys Gin 



TAT CTAGAA ATG TGT TAG TGC ChG 

ATA GAT drT TAG AGA ATG ACG GTC CTAG 

— A \ f 



15 This variant form of segment IF-4 was inserted 

in pINTY-trpl7 (digested with Hpal and BamHI) to gene- 
rate plasmid pINT.Y-TXb4 from which the iFN-Y-specif ying 
gene could be deleted by digestion with Xbal and Sail 
and the entire trp promoter/operator would remain 

20 on the large fragment. 

The following example relates to construction 
of structural analogs of IFN-y whose polypeptide struc- 
ture differs from .that of IPN-y in terms of the the 
identity of locatfon of one or more amino acids. 



EXAMPLE 5 

A first class of analogs of IFN-y was formed 
which included a lysine residue at position 81 in 
place of asparagine. The single base sequence change 
needed to generate this analog was in subunit IF-2 
of Table IV in segments 35 and 36. The asparagine- 
specifying codon, AAC, was replaced by the lysine- 
specifying codon, AAG. The isolated product of expres- 
sion of such a modified DNA sequence [Lys JIFN-y/ 
was designated y-lO* 
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Another class of IFNy analogs consists of 

polypeptides wherein one or more potential glycosilation 

sites present in the amino acid sequence are deleted. 

140 

More particularly r these consist of [Arg ]IFNy or 

140 • 
5 [Gin lIFNy wherein the polypeptide sequence fails 

to include one or more naturally occurring sequences ^ 

[(Asn or Gln)-(ANY)-(Ser or Thr)], which are known 

to provide sites for glycosilation of the polypeptide. 

One such sequence in IFNy spans positions 28 through 

10 30 r (Asn-Gly-Thr ) / another spans positions 101 through 
103 (Asn-Tyr-Ser ) . Preparation of an analog according 
to the invention with a modif icatrion at positions 
28-30 involved cleavage of plasmid containing all 
four IFN-^ subunits with BamHI and Hxndlll to delete 

15 subunit IF-3/ followed by insertion of a variant of 
subunit IF-3 wherein the AAC codon for asparagine 
therein is replaced by the codon. for glutamine^ CAG. 
(Such replacement is effected by modification of deoxy- 
oligonucleotide segment 37 to include CAG rather than 

20 AAC and of segment 38 to include GTC rather than TTG. 

See Table IV.) The isolated product of expression 

' 28 

of such a modified DNA sequence, [Gin ]IFN-Y/ was 
designated y-12. Polypeptide analogs of this type 
would likely not be glycosilated if expressed in yeast 

25 cells. Polypeptide analogs as so produced are not 

expected to differ appreciably from naturally-occurring 
IFNy in terms of reactivity with antibodies to the 
natural form, or in .duration of antiproliferative 
or immunomodulatory pharmacological effects, but may 

30 display enhanced potency of pharmacological activity 

in one or more manner. 

Other classes of IFNy analogs consists of 

39 

polypeptides wherein the [Trp ] residue is replaced 
39 

by [Phe ] , and/or wherein one or more of the methionine 
35 residues at amino acid positions 48, 80, 120 and 137 
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are replaced by/ e.g./ leucine, and/or wherein cysteines 
at amino acid positions 1 and 3 are replaced by, e.g., 
serine or are completely eliminated. These last-men- 
tioned analogs may be more easily isolated upon micro- 
5 bial expression because they lack the capacity for 

formation of intermolecular disulfide bridge formation. 
Replacement of tryptophane with phenylalanine 
at position 39 required substitution for a TGG codon 
in subunit IF-3 with TTC (although TTT could also 

10 have been used), effected by modification of the deoxy- 
oligonucleotide segment 33 (TGG to TTC) and overlapping 
segment 36 (TGA to TAG) used to manufacture IP-3. 
[Phe^^, Lys^^lIFN-Y/ the isolated product of expression 
of such a modified DNA sequence (which also included 

15 the above-noted replacement of asparagine by lysine 
at position 81) was designated 

In a like manner, replacement of one or 
more methionines at positions 48, 80, 120, and 137, 
respectively,, involves alteration of subunit IF-3 

20 (with reconstruction of deoxyoligonucleotides 31, 

32 and 34), subunit IP-2 (with reconstruction of deoxy- 
oligonucleotide segments 21 and 22); and subunit IF-1 
(with reconstruction of deoxyoligonucleotide segments 
7 and 10 and /or 3 and 4). An analog of IFN-y wherein 

25 threonine replaced methionine at position 48 was obtained 
by modification of segment 31 in subunit IF-3 to delete 
the raethionine-specifying codon ATG and replace it 
with an ACT codon. Alterations in segments 34 (TAG 

4 8 

to TGA) were also needed to effect this change. [Thr , 
30 Lys®-*-] IFN-Y f the isolated product of expression of 

such a modified DNA sequence (also including a lysine- 
specifying codon at position 81) was designated 

Replacement or deletions of cysteines at 
positions 1- and 3 involves only alteration of subunit 
35 IF-4. As a first example, modifications in construction 
of subunit IF-4 to replace both of the cysteine-specif y- 
ing codons at positions 1 and 3 (TGT and TGG, respec- 
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tively) with the serine-specif ying codon, TCT, required 

reconstruction of only 2 segments (see e and f of 

Table IV) • [Ser""", Ser^, Lys^^JIFN-y, the isolated 

81 

product of expression of the thus modified [Lys ] IFN-y 

5 DNA sequence, was designated As another example, 

1 2 3 81 

[Lys r Lys , Gin , Lys ]IFN-y^ designated yS, was 

obtained as an expression product of a modified construe* 

tion of subunit IF-4 wherein codons AAA, AAA, and 

CAA respectively replaced TTG, TAC and TGC. Finally, 

12 3 81 

10 [des-Cys"^, des-Tyr , des-Cys , Lys ]IFN-y/ designated 

Y-4, was. obtained by means of modification of subunit 

5 ' — ATC CAG— 3 ' * 

■ IF-4 sections to 3i_tac GTC-5''^^" amino acid specify- 
ing region. It should be noted that the above modifica- 
tions in the initial amino acid coding regions of 

15 the gene were greatly facilitated by the construction 
of pINTY-TXb4 in Example 4 which meant that only short 
sequences with Xbal and BamHI sticky ends needed to 
be constructed to complete the amino terminal protein 
coding sequence and link the gene to the complete 

20 trp promoter* 

Among other classes of IFN-y analog polypep- 
tide provided by the present invention are those includ- 
ing polypeptides which differ from IPN-y in terms 
of amino acids traditionally held to be involved in 

25 secondary and tertiary configuration of polypeptides. 
As an example, provision of a cysteine residue at 
an intermediate position in the IFN-y polypeptide 
may generate a species of polypeptide structurally 
facilitative of formation of intramolecular disulfide 

30 bridges between amino terminal and intermediate cysteine 
residues such as found in IFN-a. Further, insertion 
or deletion of prolines in polypeptides according 
to the invention may alter linear and bending configura- 
tions with corresponding effects on biological activity. 

35 [Lys^''-, Cys^^] IFN-Y, desigated y-9/ was isolated upon 
expression of a DNA sequence fashioned with 3»Iagc-5' 
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replacing IIIHg-S' sections 17 and 18 of subunit 
IF-2. A DNA sequence specifying [Cys lIFN-y (to 
be designated y-lD' is being constructed by the saine^^ ' 
general procedure. • Likewise,- a gene coding for -tCya , 
5 Pro^°^]IFN-V is under xjonstruction with the threonine- 
specifying codon ACA (section 15 of IP-2) being replaced 
by the proline-specifying codon CCA. 

[Glu^lIFN— f, to be designated y-13, will 
result from modification of section 43 in subunit 

10 IF-3 to include the glutamate codon, GAA, rather than 
the aspartic acid specifying codon, GAT. Because 
such a change would no longer permit the presence 
of a BamHI recognition site at that locus, subunit 
IF-3 will likely need to be constructed as a composite 

15 subunit with the amino acid specifying portions of 
subunit IF-4, leaving no restriction site between 
Xbal and Hindlll in the assembled gene. This analog 
of IFN-Y is expected to be less acid labile than the 
naturally-occurring form. 

20 The above analogs having the above-noted 

tryptophane and/or methionine and/or cysteine replace- 
ments are not expected to differ from naturally-occurring 
IPNy in terms of reactivity with antibodies to the 
natural form or in potency of antiproliferative or 

25 immunomodulatory effect but are expected to have enhan- 
ced duration of pharmacological effects. 

Still another class of analogs consists 
of polypeptides of a "hybrid" or "fused" type which 
include one or more additional amino ac^ds at the 

30 end of the prescribed sequence. These would be expressed 
by DNA sequences formed by the addition, to the entire 
sequence coding for IFVSy , of another manufactured 
DNA sequence, e.g., one of the subunits coding for 
a sequence of polypeptides peculiar to LeIFN-Con, 

35 described infra. The polypeptide expressed is expected 
to retain at least some of the antibody reactivity 
of naturally-occurring IFNy and to display some degree 
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of the antibody reactivity of LeIFN. Its pharmacologi- 
cal activities are expected to be superior to naturally- 
occurring IFN-Y both in terms of potency and duration, 
of action. 

5 Table VI/ below / sets forth the results 

of studies of antiviral activity of IFN-y prepared 
according to the invention along with that of certain 
of the analogs tested. Relative antiviral activity 
was assayed in human HeLa cells infected with encephalo- 
10 myocarditis virus (EMCV) per unit binding to a mono- 
clonal antibody to IFN-y as determined in an immunoab- 
sorbant assay. 



15 



20 



25 



30 



TABLE VI 

^ ^ ^ Relative Antiviral 

interferon Activity 

Y-1 1.00 

Y-4 0.60 

Y-5 0.10 

Y-6 0.06 

Y-10 0.51 

The following example relates to modifications 
in the polypeptide coding region of the DNA sequences 
of the previous examples which serve to enhance the 
expression of desired products. 

EXAMPLE 6 



Preliminary analyses performed on the polypep- 
tide products of microbial expression of manufactured 
DNA sequences coding for IFN-y and analogs of IFN-y 
revealed that two major proteins were produced in 
approximately equal quantities — a 17K form corresponding 
to the complete 146 amino acid sequence and a 12K 
form corresponding to an interferon fragment missing 
about 50 amino acids of the amino terminal. Review 
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of codon usage in the manufactured gene revealed the 

likelihood that the abbreviated species was formed 

as a result of microbial translation initiation at 

the Met^^ residue brought about by the similarity 

5 of base sequences 3' thereto to a Shine-Delgarno ribo- 

some binding sequence. It thus appeared that whilie 

about half of the transcribed mRNA's bound to ribosomes 

only at a locus prior to the initial methionine , the 

48 

other half were bound at a locus prior to the Met codon. 

10 In order to diminish the likelihood of ribosome binding 
internally within the polypeptide coding region, sections 
33 and 34 of subunit IP-3 were' xeconstr acted More 
specif icallyr the GAG codon employed to specify a . 
glutamate residue at position 41 was replaced by the 

15 alternate, GAA, codon and the CGT codon employed to 
specify arginine at position 45 was replaced by the 
alternate, CGC, codon. These changes, effected during 
construction of the gene specifying the y-6 analog 
of IFN-y, resulted in the expresssion of a single 

20 predominant species of polypeptide of the appropriate 
length . 

The following examples 7 and 8 relate to 
procedures of the invention for generating a manufac- 
tured gene specifying the F subtype of human leukocyte 
25 interferon ("LeuIPN-F" or "IPN-aP") and polypeptide 
analogs thereof. 

EXAMPLE 7 

30 The amino acid sequences for the human leuko- 

cyte interferon of the P subtype has been deduced 
by way of sequencing of cDNA clones. See, e.g., Goedell, 
et al.. Nature , 200 , pp. 20-26 (1981). The general 
procedures of prior Examples 1, 2 and 3 were employed 

35 in the design and assembly of a manufactured DNA sequence 
for use in microbial expression of IPN-aP. in E^ coli 
by means of a pBR322-der ived expression vector. A 
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general plan for the construction of three "major" 
subunit DNA sequences (LeuIFN-F I, LeuIFN-F II and 
LeuIFN-P III) and one "minor" subunit DNA sequence 
(LeuIFN-F IV) was evolved and is shown in Table VII 
5 below. 



wo 83/04053 PCT/US83/00605 



- 55 - 



> 



I 

S2 
fa 
H 
9 



CQ 
P3 



a\ 1-3 
u 
CO 
CQ 

&4 



o 

u 

D 
0) 

cu 
<: 

to 

-I >1 
a 

I s 



H 



u 

O CD 

EH < 
<C c>J 

EH 



rH Eh < 
^ O O 

< Eh 

III: 

&^ < 
o a 

CJ o 

< Eh-.> 

§^ 

Vi < 

O CJ 
EH < 

CP o ^ 

< EH 



CO 



< £h 

i 

CJ 



^ OMPI ^ 



wo 83/04053 



- 56 - 



PCT/US83/00605 



M 


H 


HI 




HI 




&I 




1 








&< 


W 


M 




D 


0 


(D 


o 


h) 





< 

a C9 
u o 
< 

Eh < 
O U 

u o 

E-i < 00 
O O m 
C9 O 



CD a 
< ^ 

u a 
o o 
&^ < 

o 

u o 

<n u O 

o u 



Eh ^ 



04 

c 
l-l 
o 

c 

CO 



04 



o 

iH 

o 
c 

O f-l 

^ a 
o 

u 

04 

<D 
XI 

04 

>i 
r-I 
O 

0) 

P4 



CQ 
CO 



04 

03 

to 



C7» 

V4 

a 

CO 

< 



3 


O 




O Q) 


Eh 


< 


CO J 


a 




03 


a 


8 




u 




o 




< 


w 






O 






CO 




Eh 



u 

o 

CO 

cu 

3 
Q) 
iJ 

C 
0) 

l< 
o 

P4 

u 

EH 

' c- 

iH 

a 



o 

0) 

rH 
HI 

4J 
O (U 

a 

iH 
C5 

03 
0) 

»^ 

U 

CO 

(U 
H 
M 

(0 
•H 
< 

C 

C5 

10 
f-l 
< 

0) 

O >t 

in i4 
c 

O 



C? O o 
6j gC m 
0\ < 



M 
HI 



<N < EH 



U O 00 

< EHCM 




wo 83/04053 



PCT/US83/00605 



- 57 - 



H 
M 

&4 
I 

!Z 
CP 
H 
3 
<D 
^4 



a 
c 
a 
c 

O (0 

o 

r-l 
M 

u 

0) 

x: 

ca 
>i 



o a> 

GO CO 



a o 
o a 

O CD 

<5 <N 
2 H CM 

o a 

u a 
o o 

a a" 

CP o 

o a 

O CD 



CD CJ ^ 

M 



CM- 



M 

OH 

o 



< H 
O CD 
CD U 
CD O 

EH 

CJ CD^ 
H 
Eh 



C 



D 
CD 



H 
CD 



•H 
CD 

r-l 

> 

D 
rA 
CD 

c 

iH 
CD 

rH 
M 

O 1-J 
O (0 

CO 

>1 
a 

in 
<: 

iH 

a 



c 

Q} 



CD a 

< Eh J 
CD O 

Ji: 

CD O 

Eh < 
CD U 

CD a 
CD a 

CD e; 

CD u 
r- <c E^ 

i-4 o CD 



O CD 

< Eh 

CD 

U CD 
CD U 
EH < 

.< Eh 
O CD 
CD O 

CD CJ 
Eh <: 

< £-* 



<r» a CD 

H <: E-i 
a o 

a CD 
u a 



10 
CO 



c 

.H < 
4J 

s 

a 

o o 
iH U 

U 
EH 



Eh 
CJ 

CD a 
a CD 

< &H 

< e 

CD a 
en < 

< EH 

in CD O 

r-4 Eh < 

O.CD 

CD CJ 
a CD 
C^ CD 

Eh < 
O CD 



PCT/US83/0060S 



-sa- 



fe 
I 

b 
w 
3 

<U 



0 
<U 

o u 

3 
0) 

u 
&^ 
<U 
M 

a 

<D 

cu 
u 

>t 
>i 

>i 

i4 

O r-l 
CM flj 
rH> 

3 
1-3 
01 



CO 
Oi 

CO 



M 

OS 
O 
O 



<: 

^§ 

cj a 
a a 

< B< 
Ej < CM 

< &i r-l 



0^ 

u 

CO 

o a» 
i 

W 

9 
H 

CP 

OS 
•-4 

o» 



0* 

O (0 
^ H 
iH 

CQ 
O 

o 

M 

Ck 

M 
0) 
CQ 

M 
>t 

03 
>t 

VI 
>i 



E4 



a o 

< Eh 



c 

0) 



M 



lO s 

CO 

>t 

|4 
< 

< 

3 
0) 
^3 

OI 
u 

O 3 
•WO 



0) 

a< 

r-I 
M 

la 

Q) 

cn 

3 
0> 

k:i 

u 
o 

CO 







Eh"" 


to 




U 


M CO 












6 






<o 






CO 


EH 






< 








o 


Q* 




a 






^ CM 


CO 




< 








4J 


i 




CO 







o 



88 

O O 

a q 

Eh < 

o a 

CJ 

a u 

CJ o 
CJ C3 

u o 

EH < 

< &H 

< EH 

O CP 

CP a 

< Eh 

.C5 CJ 
EH < 
CJ o 

EH < 

a eg 

EH < 



wo 83/04053 



PCT/US83/00605 



- 59 - 

As in the case of the gene manufacture 
strategy set out in Table IV, the strategy of Table VII 
involves use of bacterial preference codons wherever it 
is not inconsistent with deoxyr ibonucleotide segment 
5 constructions. Construction of. an expression vector 
with the subunits was similar to that involved with 
the IFNy-specifying gene, with minor • differences in 
restriction enzymes employed. Subunit I is ligated 
into pBR322 cut with EcoRI and Sail. (Note that the 

10 subunit terminal portion includes a single stranded 
Sail "sticky end" but, upon complementation/ a Sail 
recognition site is not reconstituted. A full BamHI 
recognition site remains, however, allowing for subse- 
quent excision of the subunit.) This first inter- 

15 mediate plasmid is amplified and subunit II is inserted 
into the amplified plasmid after again cutting with 
EcoRI and Sail. The second intermediate plasmid thus 
formed is amplified and subunit III is inserted into 
the amplified plasmid cut with EcoRI and Hindlll. The 

20 third intermediate plasmid thus formed is amplified. 
Subunit IV is ligated to an EcoRI and Xbal fragment 
isolated from pINTY-TXb4 of Example 4 and this ligation 
product (having EcoRI and BstEII sticky ends) is then 
inserted into the amplified third intermediate plasmid 

25 cut with EcoRI and BstEII to yield the final expression 
vector . 

The isolated product of trp promoter/operator 
controlled E.coli expression of the manufactured DNA 
sequence of Table VII as inserted into the final expres 
30 sion vector was designated IFN-aPj^. 

- ' EXAMPLE 8 

As discussed infra with respect to consensus 
35 leukocyte interferon, -those human leukocyte interferon 
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subtypes having a threonine residue at position 14 and 
a methionine residue at position 16 are reputed to 
display greater antiviral activity than those subtypes 
possessing Ala^* and Ile^^ residues. An analog of * 
5 human leukocyte interferon subtype F was therefore manu- 
factured by means of microbial expression of a DNA » 
sequence of Example 7 which had been altered to specify 
threonine and methionine as residues 14 and 16, respec- 
tively. More specifically, [Thr^*, Met^^] IFN-aP, 

10 designated IFN-aP2/ was expressed in E.coli upon trans- 
formation with a vector of Example 7 which had been 
cut with Sail and Hindlll and' into which a modified 
subunit II (of Table VII) was inserted. The specific 
modifications of subunit II involved assembly with seg- 

15 ment 39 altered to replace the alanine-specif ying 

codon/ GCT, with a threonine-specifying ACT codon and 
replace the isoleucine-specif ying codon, ATT, with an 
ATG codon. Corresponding changes in complementary 
bases were made in section 40 of subunit LeuIPN-PII. 

20 The following Examples 9 and 10 relate to 

practice of the invention in the microbial synthesis 
of consensus human leukocyte interferon polypeptides 
which can be designated as analogs of human leukocyte 
interferon subtype F. 

25 

EXAMPLE 9 

"Consensus human leukocyte interferon" ("IPN-Con/* * 
"LeuIPN-Con") as employed herein shall mean a non- 
30 naturally-occurring polypeptide which predominantly * 
includes those amino acid residues which are common 
to all naturally-occurring human leukocyte interferon 
subtype sequences and which includes, at one or more 
of those positions wherein there is no amino acid 
35 common to all subtypes, an amino acid which predomi- 
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nantly occurs at that position and in no event includes 
any amino acid .residue which is not extant. in that posi- 
tion in at least one" naturally-occurring subtype • (For 
purposes of this definition, subtype A is positionally 
5 aligned with other subtypes and thus reveals a "missing" 
amino acid at position 44.) As so defined, a consensus 
human leukocyte interferon will ordinarily include all 
known common amino acid residues of all subtypes. It 
will be understood that the state of knowledge con- 

10 cerning naturally-occurring subtype sequences is continu 
ously developing. New subtypes may be discovered which 
may destroy the "commonality" of a particular residue 
at a particular position. Polypeptides whose structures 
are predicted on the basis of a later-amended determina- 

15 tion of commonality at one or more positions would 

remain within the definition because they would nonethe- 
less predominantly include common amino acids and 
because those amino acids no longer held to be common 
would nonetheless quite likely represent the predomi- 

20 nant amino acid at the given positions. Failure of 

a polypeptide to include either a common or predominant 
amino acid at any given position would not remove the 
molecule from the definition so long as the residue 
• at the position occurred in at least one subtype. Poly- 

25 peptides lacking one or more internal or terminal resi- 
dues of consensus human leukocyte interferon or includ- 
ing internal or terminal residues having no counterpart 
in any subtype would be considered analogs of human 
consensus leukocyte interferon. 

30 Published predicted amino acid sequences for 

eight cDNA-derived human leukocyte interferon subtypes 
were analyzed in the context of the identities of amino 
acids within the sequence of 166 residues. See, gener- 
ally, Goedell, et al.. Nature , 290 , pp. 20-26 (1981) 

35 comparing LeIFN-A through LeIFN-H and noting. that only 
79 amino acids appear in identical positions in all 

OMPI 
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eight interferon forms and 99 amino acids appear in 
identical positions. if the E subtype (deduced from a . 
cDNA pseudogene) was ignored. Each of the remaining 
positions was analyzed for the relative frequency of 
5 occurrence of a given amino acid andr where a given 
amino acid appeared at the same position in at least 
five of the eight forms , it was designated as the pre- 
dominant amino acid for that position. A "consensus" 
polypeptide sequence of 166 amino acids was plotted 

10 out and compared back to the eight individual sequences r 
resulting in the determination that LeIFN-F required 
few modifications from its "naturally-occurring" form 
to comply with the consensus sequence. 

A program for construction of a manufactured 

15 IFN-Con DNA sequence was developed and is set out 

below in Table VIII. In the table, an asterisk desig- 
nates the variations in IFN-aF needed to develop 

22 76 78 

LelFN-Con^ r i.e./ to develop the [Arg / Ala / Asp , 

Glu^\ Tyr^O, Leu^^ Thr^^^ Asn^"^ ^^„158j 

20 analog of IFN-aF. The illustrated top strand sequence 
includes, wherever possible, codons noted to the subject 
of preferential expression in £• coli* The sequence 
also includes bases providing recognition sites for Sal, 
.Hindlll, and BstE2 at positions intermediate the se- 
25 quence and for XBal and BamHI at its ends. The latter 
sites are selected for use in incorporation of the se- 
quence in a pBR322 vector, as was the case with the 
sequence developed for IFN-oF and its analogs. 
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TABLE VIII 



-1 1 • 10 

Met-Cys-Asp-Leu-Pro-Gln-Thr-His-Ser-Leu-Gly-Asn-Arg-Arg- 

ATG TGT GAT TTA CCT CAA ACT CAT TCT CTT GGT AAC CGT CGC 

5 20 * 

Ala-Leu-Ile-Leu-Leu-Ala-Gln-Met-Arg-Arg-IIe-Ser-Pro-Phe- 

GCT CTG ATT CTG CTG GCA CAG ATG CGT CGT ATT TCC CCG TTT 

30 40 
Ser-Cys-Leu-Lys-Asp-Arg-His-Asp-Phe-Gly-Phe-Pro-Gln-Glu- 

AGC TGC CTG AAA GAC CGT CAC GAC TTC GGC TTT CCG CAA GAA 

10 50 

Glu-Phe-Asp-Gly-Asn-Gln-Phe-Gln-Lys-Ala-Gln-Ala-Ile-Ser- 
GAG TTC GAT GGC AAC CAA TTC CAG .AAA GCT CAG GCA ATC TCT 

60 

Val-Leu-His-Glu-Met-Ile-Gln-Gln-Thr-Phe-Asn-Leu-Phe-Ser- 
GTA CTG CAC GAA ATG ATC CAA CAG ACC TTC AAC CTG TTT TCC 

15 

70 * * * 80 

Thr-Lys-Asp-Ser-Ser-Ala-Ala-Trp-Asp-Glu-Ser-Leu-Leu-Gla- 

ACT AAA GAC AGC TCT GCT GCT TGG GAC GAA AGC TTG CTG GAG 

* *9a * 
Lys-Phe-Tyr-Thr-Glu-Leu-Tyr-Gln-Gln-Leu-Asn-Asp-Leu-Glu- 

AAG TTC TAG ACT GAA CTG TAT CAG CAG CTG AAC GAC CTG GAA 

100 110 
Ala-Cys-Val-Ile-Gln-Glu-Val-Gly-Val-Glu-Glu-Thr-Pro-Leu- 

GCA TGC GTA ATC CAG GAA GTT GGT GTA GAA GAG ACT CCG CTG 

120 

• Met-Asn-Val-Asp-Ser -Ile-Leu-Ala-Val-Ly s-Ly s-Tyr-Phe-Gln~ 
25 ATG AAC GTC GAC TCT ATT CTG GCA GTT AAA AAG TAC TTC CAG 

130 

Arg-Ile-Thr-Leu-Tyr-Leu-Thr-Glu-Lys-Lys-Tyr-Ser-Pro-Cys- 
CGT ATC ACT CTG TAC CTG ACC GAA AAG AAA TAT TCT CCG TGC 

140 150 
Ala-Trp-Glu-Val-Val-Arg-Ala-Glu-lle-Met-Arg-Ser-Phe-Ser- 

GCT TGG GAA GTA GTT CGC GCT GAA ATT ATG CGT TCT TTC TCT 

* * * 160 166 Stop 
Leu-Ser-Thr-Asn-Leu-Gln-Glu-Arg-Leu-Arg-Arg-Lys-Glu 
CTG TCT ACT AAC CTG CAG GAG CGT CTG CGC CGT AAA GAA TAA 

Stop 
TAG 



35 
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Table IX below sets out the specific double 
stranded DNA sequence for preparation 4 subunit DNA 
sequences for use in manufacture of IPN-Conj^. Subunit 
LeuIFN-Con IV is a duplicate of 'LeuIFN-F IV of Table 
S VIII • Segments of subunits which differ from those 
employed to construct the IFN-aF gene are designated 
with a "prime" (e.g./ 37' and 38* are altered forms 
of sections 37 and 38 needed to provide arginine rather 
than glycine at position 22) . 
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The four subunits of Table IX were sequentially 
inserted into an expression vector according to the pro- 
cedure of Example 7 to yield a vector having the coding 
region of Table VIII under control of a trp promoter/ 
5 operator. The product of expression of this vector in 
E.coli was designated IFN-Con^. l£.will be noted that , 
this polypeptide includes all common residues indicated 
in Goedellr et al*, supra ^ and, with the exception of 
Ser^°, Glu^^, Val^-^^, and Lys^^-"-, included the predomi- 

10 nant amino acid indicated by analysis of the reference's 
summary of sequences. The four above-noted residues 
were retained from the native I-PN-aF sequence to facili- 
tate construction of subunits and assembly of subunits 
into an expression vector. (Note, e.g., serine was 

IS retained at position 80 to allow for construction of a 

Hindlll site.) 

Since publication of the Goedell, et al. 
summary of IFN-a subtypes, a number of additional sub- 
types have been ascertained. Figure 2 sets out in 

20 tabular form the deduced sequences of the 13 presently 
known subtypes (exclusive of those revealed by five 
known cDNA pseudogenes) with designations of the same 
IFN-a subtypes from different laboratories indicated 
parenthetically (e.g., IFN-a6 and iFN-aK) . See, e.g., 

25 Goedell, et al., supra ; Stebbing, et al., in: Recombi- 
nant DNA Products, Insulin, Interferons and Growth 
Hormones (A. Bollon, ed.), CRC Press (1983); and Weiss-, 
man, et al., U.CL.A. Symp.Mol.Cell Biol. , 25, pp. 
295-326 (1982) . Positions where there is no common 

30 amino acid are shown in bold Sace. IFN-a subtypes are 
roughly grouped on the basis of amino acid residues, 
in seven positions (14, 16,' 71, 78, 79, 83, and 160) 
the various subtypes show* just two' alternative amino 
acids, allowing classification of the subtypes into two 

35 subgroups (I and II) based on which of the seven posi- 
tions are occupied by the same amino acid residues. 
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Three IFN-a subtypes (H, F and B) cannot be classified 
as Group I or Group II and, in terms of distinguishing 
positions, they appear to be natural hybrids of both 
group subtypes* It has been reported that IFN-a sub- 
5 types of the Group I type display relatively high anti- 
viral activity while those of Group II display rela- 
tively high antitumor activity. 

IFN-Con^^ structure is described in the final 
line of the Figure. It is noteworthy that certain resi- 
10 dues of IFN-Con^^ (e.g., serine at position 8) which were 
determined to be "common" on the basis of the Goedell, 
et al., sequences are now seen to be "predominant". 
Further, certain of the IFN-Coni residues determined 

22 

to be predominant on the basis of the reference (Arg , 

78 79 86 

15 Asp , Glu , and Tyr ) are no longer so on the basis 
of updated information, while certain heretofore non- 
predominant others (Ser 
mined to be predominant « 

20 EXAMPLE 10 



A human consensus leukocyte interferon which 
differed from IFN-Con^^ in terms of the identity of amino 
acid residues at positions 14 and 16 was prepared by 

25 modification of the DNA sequence coding for IFN-Con^. 
More specifically, the expression vector for IFN-Con^^ 
was treated with BstEII and Hind III to delete subunit 
LeuIFN Con III. A modified subunit was inserted wherein 
the alanine-specifying codon, GCT, of sections 39 and 

30 40 was altered to a threonine-specifying codon, ACT, 

and the isoleucine codon, CTG, was changed to AT6. The 
product of expression of the modified manufactured gene, 
[Thr^^ Metl^ Arg22, Ala^^ Asp'^^ Glu^^, Tyr^^ Tyr^^ 
Leu^^, Thr^^^r Asn-*-^^, Leu^^®] IFN-aF, was designated 

35 IFN-Conj . 



on 03 

predominant others (Ser and Glu } now can be deter- 
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Presently being constructed is a gene for a 
consensus human leukocyte interferon polypeptide which 
will differ from IPN-Conj^ in terras of the identity of 
residues at positions 114 and 121. More specif ically, 
5 the Val^^* and Lys*"-^^ residues which duplicate IFN-aP 
subtype residues but are not . predominant amino acids 
will be changed to the predominant 61u^^^ and Arg^^^ 
residues/ respectively. Because the codon change from 
Val''"'^^ to Arg^"''^ (e.g./ GTC to GAA) will no longer allow 
10 for a Sail site at the terminal portion of subunit 
LeuIFN -Con I (of Table IX) / .subunits I and II will 
likely need to be constructed as a single subunit - 
Changing the AAA/ lysine, codon of sections 11 and 12 
to CTG will allow for the presence of arginine at posi- 

15 tion 121. The -product of microbial expression of the 

22 76 78 79 86 

manufactured gene, [Arg , Ala , Asp , Glu , Tyr , 

Tyr^°, Leu^e, Glu^l^ Arg^.^^ Thr^^^ Asn"7; Leul58] " 
IPN-aF/ will be designated IFN-Con^. 

The following example relates to procedures 
for enhancing levels of expression of exogenous genes 
in bacterial species, especially/ E.coli . 

EXAMPLE 11 

In the course of development of expression 
vectors in the above examples, the trp promoter/operator 
DNA sequence was employed which included a ribosome 
binding site ("RBS") sequence in a position just prior 

30 to the initial translation start (Met"^/ ATG) . An 
attempt was made to increase levels of expression of 
the various exogenous genes in E.coli by incorporating 
DNA sequences duplicative of portions of putative RBS 
sequences extant in genomic E.coli DNA sequences associ- 

35 ated with highly expressed cellular proteins. Ribosome 
binding site sequences of such protein-coding genes as 



20 



25 
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reported in Inokuchi, et al* Nuc > Acids. Res . , 10, 
pp. 6957-6968 (1982) ^ Gold, et al., Ann. Rev. Microbiol. . 
25, pp. 365-403 (1981) and Alton, et al.. Nature , 282 , 
pp. 864-869 (1979), were reviewed and the deterraina- 
5 tion was made to employ sequences partially duplicative 
of those associated. with the- E.coli ^ proteins OMP-P 
(outer membrane protein P),:CRO and CAM (chloramphenicol 
transacetylase) . 

By way of example, to duplicate a portion 
10 of the OMP-F RBS sequence the following sequence is 
inserted prior to the Met^^ codon. 

5 • -AACCATGAGGGTAATAaATA-3 ' 
3 • -TTGGTACTCCCATTATTTAT-5 ' 

In order to incorporate this sequence in 
a position prior to the protein coding region of, e.g., 
the manufactured gene coding for IFN-Con^^ or IFN-oFj^, 
subunit IV of the expression vector was deleted (by 
cutting the vector with Xbal and BstEll) and replaced 
2Q with a modified subunit IV involving altered sections 
41A and 42A and the replacement of sections 43 and 
44 with new segments RBI and RB2. The construction 
of the modified sequence is as set out in Table X, 
below. 

' TABLE X 



Xbal -112 

Met Cys Asp 



I ^RBl , 

30 CTAGAAA CCA TGA GGG TAA TAA ATA ATG TGT GAT 
,TTT GGT ACT CCC ATT ATT TAT TAG ACA CTA 

J RB2: J 



3 4 5 6 7 8 9 
Leu Pro Gin Thr His Ser Leu BstEII 

35 

41A , 

TTA CCT CAA ACT CAT TCT CTT G 
AAT GGA GTT TGA GTA AGA GAA CATG 
42A « 
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Table XI, below, illustrates the entire DNA 
sequence in the region preceding the protein coding 
region of the reconstructed gene starting with the Hpal 
site within the trp promoter/operator (compare subunit 
5 iF-4 of Table IV) . , . . , 

TABLE XI • 



10 



15 



20 



25 



30 



Hpal Xbal 

AAC TAG TAG GCA AGT TCA CGT AAA AAG GGT ATC TAG AAA CCA 

TTG ATC ATG CGT TCA AGT GCA TTT TTC CCA TAG ATC TTT GGT 

-1 1 2 •' 3 • 4 5 6 7 
Met Cys Asp Leu Pro Gin Thr His 

TGA GGG TAA TAA ATA ATG TGT GAT TTA CCT CAA ACT CAT 
ACT CCC ATT ATT TAT TAG ACA CTA AAT GGA GTT TGA GTA 



8 9 Bst£ II 
Ser Leu- 

TCT CTT G 
AGA 6AA CATG 

Similar procedures were followed to incorpo- 
rate sequences duplicative of RES sequences of CRO and 
CAM genes, resulting in the following sequences immedi- 
ately preceding the Met"^ codon. 

1 10 20 

• • • 

CRO: GCATGTACTAAGGAGGTTGT 
CGTACATGATTCCTCCAACA 

1 10 20 

• ■ • 

CAM: CAGGAGCTAAGGAAGCTAAA 
GTCCTCGATTCCTTCGATTT 



It will be noted that all the BBS sequence inserts 
possess substantial homology to Shine-Delgarno 
sequences, are rich in adenine and include sequences 
35 ordinarily providing "stop" codons. 
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Levels of E.coli expression of IFN-Con^^ were 
determined using trp-con trolled expression vectors 
incorporating the three RBS inserts (in addition to the 
RBS sequence extant in the complete trp promoter/oper- 
5 ator); Expression of the desired polypeptide using the 
OMP-P RBS duplicating sequence was at from 150-300 mg 
per liter of culture r representing from 10 to 20 percent 
of total protein. Vectors incorporating the CAM RBS 
duplicating sequence provided levels of expression which 
10 were about one-half that provided by the OMP-F variant. 
Vectors including the CRO RBS duplicating sequence 
yielded the desired protein at .levels of about one-tenth 
that of the OMP-F variant. 

The following example relates to antiviral 
activity screening of human leukocyte interferon and 
polypeptides provided by the preceding examples. 

EXAMPLE 12 

Table XII below provides the results of 
testing of antiviral activity in various cell lines of 
natural (buffy coat) interferon and isolated, microbially- 
expressed/ polypeptides designated IFN-aF^, IFN-aFjr 
IFN-Con^^/ and IFN-Con2. Viruses used were VSV (vesicular 
stomatitis virus) and EMCV (encephalomyocarditis virus). 
Cell lines were from various mammalian sources ^ including 
htiman (WISH, HeLa) , bovine (MDBK) , mouse (MLV-6) , and 
monkey (Vero) . Antiviral activity was determined by an 
end-point cytopathic effect assay as described in Week, 
et al., J, Gen. Virol. , 57, pp. 233-237 (1981) and Camp- 
bell, et al . . • Can . J .Microbiol . , 21, pp. 1247-1253 (1975). 
Data shown was normalized for antiviral activity in WISH 
cells . 



aim 

. WIPO 
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TABLE 


XII 








Cell 


Buffy 


IFN- 


IFN- 


IFN- 


IFN- 


Virus 


Line 






.«^2 




Con2 


VSV 


WISH '3 


ICQ 1 


: 100^ 


100 


■■-i- 100 


100 •■ 


vsv 


HeLa 


400 ^ 


•100 


•'ND* ■ 


200 


•100 


VSV 


MDBK 


1600 


33 


ND 


200 


300 


vsv 




20 


5 


ND 


3 


20 


vsv 


Vero 


10 


0.1 


ND 


10 


0.1 


EMCV 


WISH 


100 


100 


100 


100 


100 


EMCV 


HeLa 


100 


5 


ND 


33 


33 


EMCV 


Vero 


100 


20 


■ -ND- 


lOOO' 


10 



*ND - no data presently available, 



15 



20 



25 



30 



35 



It will be apparent from the above examples 
that the present invention provides, for the first time, 
an entire new genus of synthesized/ biologically active 
proteinaceous products which products differ from 
naturally-occurring forms in terms of the identity 
and/or location of one or more amino acids and in terms 
of one or more biological (e.g., antibody reactivity) 
and pharmacological (e.g./ potency or duration of effect) 
but which substantially retain other such properties. 

Products of the present invention and/or anti- 
bodies thereto may be suitably "tagged", for example 
radiolabelled (e.g., with I*"-^^) conjugated with enzymes 
or fluorescently labelled, to provide reagent materials 
useful in assays and/or diagnostic test kits, for the 
qualitative and/or quantitative determination of the 
presence of such products and/or said antibodies in 
fluid samples. Such anitbodies may be obtained from 
the innoculation of one or more animal species (e.g., 
mice rabbit, goat, human, etc.) or from monoclonal anti- 
body sources. Any of such reagent materials may be used 
alone or in combination with a suitable substrate, e.g., 
coated on a glass or plastic particle bead. 
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Numerous modifications and variations in the 
practice of the invention are expected to occur to those 
skilled in the art upon consideration of the foregoing 
illustrative examples. Consequently , the invention 
5 should be considered as limited only to* the extent 
reflected by the appended* claims. 1.1 
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WHAT IS CLAIMED IS: 

1. A method for the manufacture of linear , 
double stranded DNA sequences of a length in excess 
5 of about 200 bas^ pairs and coding for expression 
of a predetermined , continuous, sequence of amino acids 
with a selected host "microorganism transformed by 
a selected DNA vector incuding the sequence, said 
method comprising: 

(a) preparing two or more different, subunit, 
linear, double stranded DNA sequences of from about 
100 to about 200 base pairs in length, 

each different subunit DNA sequence comprising 
a series of nucleotide base codons coding for a differ- 
15 ent continuous portion of said predetermined sequence 
of amino acids to be expressed, 

one terminal region of a first of said sub- 
units comprising a portion of a base sequence which 
provides a recognition site for cleavage by a first 
20 restriction endonuclease, which recognition site is 
entirely present either once or not at all in said 
selected assembly vector upon insertion of the subunit 
therein, 

one terminal region of a second of said 
25 siibunits comprising a portion of a base sequence which 
provides a recognition site for cleavage by a second 
restriction endonuclease other than said first endonuc- 
lease, which recognition site is entirely present 
once or not at all in said selected assembly vector 
30 upon insertion of the subunit therein, 

at least one-half of all remaining terminal 
regions of subunits comprising a portion of a recogni- 
tion site for restriction endonuclease cleavage by 
an endonuclease other than said first and second endo- 
35 nucleases, which recognition site is entirely present 
once and only once in said selected assembly vector 
after insertion of all subunits thereinto; and 
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(b) serially inserting each of said subunit 
DNA sequences prepared in step (a) into the selected 
assembly vector and effecting the biological amplifica- 
tion of the assembly vector subsequent to each insertion, 
5 thereby to, form a DNA vector including the desired 
DNA sequence coding for the predetermined continuous 
amino acid sequence and wherein the desired DNA sequence 
assembly includes at least one unique recognition 
site for restriction endonuclease cleavage at an inter- 
10 mediate position therein. 

2. A method according to claim 1 wherein 
the restriction site for endonuclease cleavage by 
the restriction endonuclease other than said first 

15 and second . endonucleases is a palindromic six base 

recognition site and the desired DNA sequence assembled 
has at least one unique six base recognition site 
for restriction endonuclease cleavage at an intermediate 
position therein. 

20 

3. A method according to claim 1 wherein 
at least three different subunit DNA sequences are 
prepared in step (a) and serially inserted into said 
selected vector in step (b) and the desired DNA sequence 

25 obtained includes at least two unique restriction 

endonuclease recognition sites at intermediate positions 
therein. 



30 



4. A method according to claim 1 wherein 
the DNA sequence manufactured comprises an entire 
structural gene coding for a biologically active poly- 
peptide. 

5. A method according to claim 1 wherein, 
35 in the DNA sequence manufactured, the sequence of 

nucleotide bases includes one or more codons selected. 
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from among alternative codons specifying the same 
amino acid, on the basis of preferential expression 
characteristics of the codon in said selected host 
microorganism. 

5 

6* A manufactured, linear, double stranded 
DNA sequence of a length in excess of about 200 base 
pairs and coding, for the expression of a predetermined 
continuous sequence of amino acids by a selected host 
10 microorganism transformed with a selected DNA vector 
including the sequence, said sequence characterized 
by having at least one unique. palindromic six base 
recognition site for restriction endonuclease cleavage 
at an intermediate position therein. 

15 

1. A DNA sequence according to claim 6 
characterized by having two or more unique restriction 
endonuclease cleavage sites at intermediate positions 
therein, at least one of which has a six base palindromic 
20 recognition site. 

8. A DNA sequence according to claim 6 
which comprises an entire structural gene coding for 
a biologically active polypeptide. 

25? 

9. A DNA sequence according to claim 8 
wherein the gene codes for a human polypeptide. 

10. A DNA sequence according to claim 8 

30 wherein the gene codes for a polypeptide which differs 
from a naturally-occurring human polypeptide in terms 
of the identity and/or location of one or more amino 
acids. 

35 11. A DNA sequence according to claim 6 

wherein the sequence of nucleotide bases includes 
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at least one or more codons selected, from among alterna- 
tive codons specifying the same amino acids, on the 
basis of preferential expression characteristics of 
the codon in said selected host microorganism. 

5 

12. A polypeptide product of the* expression 
by an organism of a manufactured DNA sequence according 
to claim 6. 

10 13. A biologically functional DNA microorgan- 

ism transformation vector including a manufactured 
DNA sequence according to claim ..6. 

14. A vector according to claim 12 which 
15 is a circular DNA plasmid. 

15. A manufactured gene capable of directing 
the synthesis in a selected host microorganism of 
human immune interferon. 

20 

16. A manufactured gene according to claim 
15 having at least one unique six base recognition 
site for restriction endonuclease cleavage at an inter- 
mediate position therein. 

25/ 

17. A manufactured gene according to claim 
15 wherein the base sequence includes one or more 
codons selected, from among alternative codons specify- 
ing the same amino acid, on the basis of preferential 

30 expression characteristics of the codon in a projected 
host microorganism. 

18. A manufactured gene according to claim 
17 wherein the base sequence includes one or more 

35 codons selected, from among alternative codons specify- 
ing the same amino acid, on the basis of preferential 
expression characteristics of the codon in E. coli. 
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19. A manufactured gene according to claim 
15 wherein the base sequence/ commencing with the 
5' end of the top strand, is as follows: 

1 

TGT-TAC-TGC-CAG-GAT-CCG-TAC-GTT-AAG-GAA-GCA-GAA-AAC-CTG- 

15 ■ •■ ■ ' • 

AAA-AAA-TAC-TTCrAAC-GCArGGC-CAC-TCC-GAC-GTA-GCT-GAT-AAC- 

29 

GGC-ACC-CTG-TTC-CTG-GGT-ATC-CTA-AAA-AAC-T6G-AAA-GAG-GAA- 
43 

TCC-GAC-CTG-AAG-ATC-ATG-CAG-TeT7<:AA-ATT-GTA-AGC-TTC-TAC- 
57 

TTC-AAA-CTG-TTC-AAG-AAC-TTC-AAA-GAC-GAT-CAA-TCC-ATC-CAG- 

71 ^ 
AAG-AGC-GTA-GAA-AGT-ATT-AAG-GAG-GAC-ATG-AAC-GTA-AAA-TCC- 

85 

TTT-AAC-AGC-AAC-AAG-AAG-AAA-CGC-GAT-GAC-TTC-GAG-AAA-CTG- 
94 

ACT-AAC-TAC-TCT-GTT-ACA-GAT-CTG-AAC-GTG-CAG-CGT-AAA-GCT- 
113 

ATT-CAC-GAA-CTG-ATC-CAA-GTT-ATG-GCT-GAA-CTG-TCT-CCT-GCG- 
127 

GCA-AAG-ACT-GGC-AAA-CGC-AAG-CGT-AGC-CAG-ATG-CTG-TTT-CAG- 
141 

[ or CGT] -CGT-CGC-C6T-GCT-TCT-CAG . 

20. A manufactured gene according to claim 
19 wherein the base codon for amino acid 41 is 6AA and 

. for amino acid 46 is CGG. 

21. A biologically functional DNA microorgan- 
ism transformation vector including a manufactured gene 
according to claim 15. 



OM?I 
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22. A vector according to claim 21 which is 
a circular DNA plasmid. 

23. A process for the production of human 
5 . immune interferon comprising: 

growing :under appropriate . nutrient conditions r . 
microorganisms transformed with biologically functional 
DNA including a manufactured gene according to claim 15, 
whereby said microorganisms express said gene and pro- 
10 duce authentic human immune interferon. 

24. A process according to claim 23 wherein 
the microorganisms grown are E. coli microorganisms. 

15 25. A manufactured gene capable of directing 

the synthesis in a selected host microorganism of a 
polypeptide which differs from human immune interferon 
in terms of the identity and/or location of one or more 
amino acids. 

20 

26. A manufactured gene according to claim 
25 having at least one unique six base recognition site 
for restriction endonuclease cleavage at an intermediate 
position therein. 

25 . ' 

27. A manufactured gene according to claim 

25 wherein the base sequence includes one or more codons 
selected f from among alternative codons specifying the 
same amino acid/ on the basis of preferential expression 
30 characteristics of the codon in a projected host micro- 
organism. 

28. A manufactured gene according to claim 

27 wherein the base sequence includes one or more codons 
35 selected/ from among alternative codons specifying the 
same amino acid/ on the basis of preferential expression 
characteristics of the codon in E. coli. 



wo 03/04053 



PCT/US83/00605 



- 83 - 

29. A biologically functional DNA microorgan- 
ism transformation vector including a manufactured gene 
according to claim 25. 

5 30. A vector according to claim 29 which is 

a circular DNA plasmid. , 

31. A process for the production of a poly- 
peptide analog of human immune interferon comprising: 

growing, under appropriate nutrient conditions, 
microorganisms transformed with biologically functional 
DNA including a manufactured gene according to claim 25, 
whereby said microorganisms express said gene and produce 
authentic human immune interferon.. 



15 



25 



30 



32.^ A process according to claim 31 wherein 
the microorganisms grown are E. coli microorganisms. 



• 33. A polypeptide product of the process of 
20 claim 23. 

34. A polypeptide product of the process of 

claim 31. 



35. A polypeptide product according to claim 
34 selected from the group consisting ofs 
[Lys^^lIPNy; 



[Ser^, Ser^, Lys^^JIFNy; 



[Lys^, Lys^, Gln^, Lys^-""] IFNy ; 
[des-Cys^, des-Tyr^, des-Cys^, Lys lIFNy; 



[Phe^^, Lys^-'-JIFNY; 
[Thr^^, Lys^^llFNy; 
[Lys^^, Cys^^llFNY; 
[Cys^^lIFNy; 
35 [Cys^^, Pro^°^lIFNY; 

[Gln^^lIFNy; and 
IGlu^lIFNy. 
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36. A manufactured gene capable o£ directing 
the synthesis in a selected host microorganism of con- 
sensus human leukocyte interferon. 

5 37. Consensus human leukocyte interferon. 

38. A consensus human leukocyte interferon 
according to claim 37 selected from the group consisting 
of: 

10 [Arg22, Ala''^, Asp^^, Glu'^^, Tyr^^, Tyr^°, 

Leu^^, Thr^^^, Asn^^^^ Leu^^^]IFN-aF; 

[Thr^^r Met^^, Arg^^, Ala'^r Asp^^, Glu^^, 

Tyr^^r Tyr^°, Leu^^, Thr^^^, Asn^^'', Leu^^^] IFN-aF; and 
15 [Arg22, Ala'^^ Asp^^^ Glu^^, Tyr^^ Tyr^O, 

Leu5^ Glu^l*, Argl^l, Thr^^^ Asn^^^, Leu^^^] IFN-aF. 

39. A manufactured gene capable of directing 
2Q the synthesis in a selected host microorganism of human 

leukocyte interferon subtype P. 



40. A manufactured gene according to claim 
39 wherein the base sequence of the top strand is as 
25 * follows : 

5 • -TGT-GAT-TTA-CCT-CAA-ACT-CAT-TCT-CTT-GGT-AAC-CGT-CGC- 
GCT-CTG-ATT~CTG-CTG-GCA-CAG-ATG-GGT-CGT-ATT-TTC-CCG-TTT- 
AGC-TGC-CTG-AAA-GAC-CGT-CAC-GAC-TTC-GGC-TTT-CCG-CAA-GAA- 
GAG-TTC-GAT-GGC-AAC-CAA-TTC-CAG-AAA-GCT-CAG-GCA-ATC-TCT- 

30 ^ GTA-CTG-CAC-GAA--ATG-ATC-CAA--CAG-ACC--TTC-AAC-CTG--TTT-TCC- 

ACT-AAA-GAC-AGC-TCT-GCT-ACC-TGG-GAA-CAA-AGC-TTG-CTG-GAG- * 
AAG-TTC-TCC-ACT-GAA-CTG-AAC-CAG-CAG-CTG-AAC-GAC-ATG-GAA- 
GCA-TGC-^TA-ATC-CAG-GAA-GTT-GGT-GTA-GAA-GAG-ACT-CCG-CTG- 
ATG-AAC-GTC-GAC-TCT-ATT-CTG-GCA-GTT-2^-AAG-TAC-TTC-CAG- 

35 CGT-ATC-ACT-CTG-TAC-CTG-ACC-GAA-AAG-AAA-TAT-TCT-CCG-TGC- 
GCT-TGG-GAA-GTA-GTT-CGC-GCT-GAA-ATT-ATG-CGT-TCT-TTC-TCT- 
CTG-AGC-AAA-ATC-TTC-CAG-GAG-CGT-CTG-CGC-CGT-AAA-GAA-3* 



OMPI _ ^ 
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41. A process for production of human leuko- 
cyte interferon subtype F comprising: 

growing/ under appropriate nutrient conditions ^ 
microorganisms transformed with biologically functional 
5 DNA including. a manufactured gene according to. claim 39, 
whereby said microorganisms express said gene and produce 
human leukocyte interferon subtype P. 

42. The polypeptide product of the process 
10 of claim 41. 

43. A manufactured .gene capable of directing 
the synthesis in a selected host microorganism of a poly- 
peptide which differs from human leukocyte interferon 

15 subtype P in terms of the identity and/or location of 
one or more amino acids. . 

44. A process for production of a polypeptide 
which differs from human leukocyte interferon subtype 

20 p in terms of the identity and/or location of one or 

more amino acids comprising: 

growing / under appropriate nutrient conditions, 

microorganisms transformed with biologically functional 

DNA including a manufactured gene according to claim 43, 
25 whereby said microorganisms express said gene and produce 

said polypeptide. 

45. A polypeptide product of the process of 

claim 44. 

30 

46. A polypeptide product according to claim 
45 which is [Thr^^, Met^^] IPN-aP. 

47. A reagent material comprising a radio- 
35 labelled manufactured DNA sequence according to claim 

6. 
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48, A reagent material according to claim 47 

125 

wherein said radiolabel is I • 

49 • A reagent material comprising a tagged ^ 
5 antibody to a polypeptide according to claim 12, coated 
on the surface* of a plaistic bead, • 

50. A microbially synthesized, biologically 
active proteinaceous product which differs from a 

10 naturally-occurred form thereof in terms of the identity 
and/or location of one or more amino acids and in terms 
of one or more biological and 'pharmacological propeties 
but which substantially retains other such properties 
of the naturally-occurring form. 

.15 

51. A 'polypeptide product according to claim 
50 selected from the group consisting of: 

[Lys^-^-JIFNy; 
. [Ser^, Ser^r Lys^^lIFNy; 

20 [Lys^r Lys^r Gln^, Lys^^lIFNy; 

1 2 3 81 

[des-Cys , des-Tyr , des-Cys , Lys ]lFUyj 

[Phe^^, Lys^-'-lIPNy; 

[Thr*^, Lys^^lIFNy; 

[Lys^^, Cys^^lIPNY; 

25 ICys^^]IPNY; 

[Cys^^, Pro-'-^^lIFNY; 

[Glu^JIFNy; and « 



30 



[Thr^*, Met-'-^lIFN-ctP. 



52. A process for enhancing the expression 
of an exogenous, vector-borne/ gene in E.coll / said 
process comprising: 

inserting in said vector, in a position up- 
35 stream of the exogenous gene protein coding sequence, 
a DNA sequence comprising base pairs duplicative of 
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ribosome binding site base pairs associated with E,coli 
synthesis of OMP-P, CRO and CAM gene products. 

53. The process of claim 52 wherein the 

5 inserted DNA sequence is a sequence associated with • - 
E.coli synthesis o'f' OMP-F gene "products."^ ' ••■ 

54. The process'- of claim 53 wherein the 
inserted DNA sequence is: 

10 

1 10 20 

5 ' -AACCATGAGGGTAATAAATA-3 ' 
3 • -TTGGTACTCCCATTATTTAT-5 ' . 

55. The" process of claim 53 wherein the 
inserted dna' sequence follows a trp promoter/operator 
sequence . 
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